2014-08-27
§
|
22:53 |
<hashar> |
removed the blackhole ip route from deployment-cache-text02 and deployment-cache-mobile03 |
[releng] |
22:48 |
<hashar> |
the IP is a known security audit. See Chris Steipp. |
[releng] |
22:46 |
<hashar> |
blackholed an IP address on deployment-cache-text02 and deployment-cache-mobile03 , it was causing hundred of requests per seconds and overloaded the beta cluster. Use route -n to find the IP |
[releng] |
22:37 |
<hashar> |
restarting udp2log-mw on deployment-bastion. It keeps crashing since fiarly recently |
[releng] |
22:26 |
<bd808> |
when restarting varnish on deployment-cache-text02, don't forget that there are 2 varnish services (varnish and varnish-frontend) |
[releng] |
22:19 |
<bd808> |
restarted varnish (again) on deployment-cache-text02 |
[releng] |
22:10 |
<bd808> |
restarted varnish on deployment-cache-text02 |
[releng] |
16:22 |
<bd808> |
killing `apt-get update` process running on deployment-bastion since Jun13 |
[releng] |
14:59 |
<bd808> |
Resolved puppet git merge conflict on deployment-salt |
[releng] |
14:49 |
<bd808> |
Moved hhvm core dumps to /data/project/hhvm-cores |
[releng] |
14:42 |
<bd808> |
Root dirve full on deployment-mediawiki02; hhvm core files are the culprit |
[releng] |
2014-08-25
§
|
23:47 |
<ori> |
stopping hhvm/apache on deployment-mediawiki02 to replace debug build of hhvm with release build |
[releng] |
21:44 |
<bd808> |
Deployed scap 116027f (Make sync-common update l10n cdb files by default) |
[releng] |
18:30 |
<ori> |
deployment-mediawiki02: cleared /tmp; running puppet |
[releng] |
15:05 |
<hashar> |
mediawiki02 rm /tmp/hhvm*.core . Filled as {{bug|69979}} |
[releng] |
15:01 |
<hashar> |
mediawiki02 rm /tmp/mw-cache-master/conf* |
[releng] |
15:01 |
<hashar> |
mediawiki02 has mw conf caches under /tmp/mw-cache-master/ and since that partition is filled up, that ends up with conf caches being null file |
[releng] |
15:00 |
<hashar> |
mediawiki02 rm /var/log/upstart/hhvm* |
[releng] |
14:53 |
<hashar> |
mediawiki02 : removed /var/lib/puppet/state/agent_catalog_run.lock |
[releng] |
14:46 |
<hashar> |
restarting udp2log-mw service on -bastion. It is stalled for some reason |
[releng] |
14:42 |
<hashar> |
on mediawiki02 , clearing out some /var/log/upstart/hhvm.* log files see {{bug|69976}} |
[releng] |
14:34 |
<hashar> |
mediawiki02 / partition is 100% full |
[releng] |
2014-08-21
§
|
21:49 |
<bd808> |
Trebuchet happier after all the salt-minion restarts; still have deleted hosts showing in the expected minion list for scap deploys |
[releng] |
21:01 |
<twentyafterfour> |
Started salt-minion on deployment-redis01 |
[releng] |
21:01 |
<bd808> |
Started salt-minon on deployment-upload |
[releng] |
21:00 |
<bd808> |
Started salt-minon on deployment-fluoride |
[releng] |
21:00 |
<bd808> |
Started salt-minon on deployment-db1 |
[releng] |
20:59 |
<bd808> |
Started salt-minon on deployment-elastic01 |
[releng] |
20:59 |
<twentyafterfour> |
Started salt-minion on deployment-eventlogging02 |
[releng] |
20:58 |
<bd808> |
Started salt-minon on deployment-elastic02 |
[releng] |
20:58 |
<bd808> |
Started salt-minon on deployment-elastic03 |
[releng] |
20:57 |
<bd808> |
Started salt-minon on deployment-elastic04 |
[releng] |
20:57 |
<bd808> |
Started salt-minon on deployment-analytics01 |
[releng] |
20:55 |
<bd808> |
Started salt-minon on deployment-cache-upload02 |
[releng] |
20:54 |
<bd808> |
Started salt-minon on deployment-memc04 |
[releng] |
20:54 |
<bd808> |
Started salt-minon on deployment-parsoid04 |
[releng] |
20:49 |
<bd808> |
Started salt-minon on deployment-memc05 |
[releng] |
20:48 |
<bd808> |
Started salt-minon on deployment-db2 |
[releng] |
20:48 |
<twentyafterfour> |
Started salt-minion on deployment-cache-text02 |
[releng] |
20:47 |
<twentyafterfour> |
Started salt-minion on deployment-memc03 |
[releng] |
20:47 |
<bd808> |
Started salt-minon on deployment-cxserver01 |
[releng] |
20:12 |
<bd808> |
List of broken salt minions can be obtained with `sudo salt-run manage.down` on deployment-salt |
[releng] |
19:55 |
<bd808> |
Fixed salt on deployment-memc02 |
[releng] |
19:52 |
<bd808> |
Salt minions are broken all over beta. Hung grain-ensure calls, hung test.ping calls, downed minions |
[releng] |
19:50 |
<bd808> |
Killed dozens of grain-ensure calls and started salt-minion on deployment-cache-mobile03 |
[releng] |