2020-09-08
ยง
|
16:03 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
16:02 |
<akosiaris@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
15:34 |
<jayme@cumin1001> |
conftool action : set/pooled=yes; selector: name=kubernetes1004.* |
[production] |
15:32 |
<jayme@cumin1001> |
conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.* |
[production] |
15:30 |
<elukey> |
roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed |
[production] |
15:26 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) |
[production] |
15:20 |
<_joe_> |
restarted celery-ores-worker.service on ores1007 |
[production] |
15:19 |
<_joe_> |
restarted ferm on wdqs1011 |
[production] |
15:18 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-masters |
[production] |
15:16 |
<_joe_> |
starting wdqs-updater on wdqs1005 |
[production] |
15:15 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet |
[production] |
15:14 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet |
[production] |
15:14 |
<bblack> |
repool cp1087-90 (eqiad row D) |
[production] |
15:13 |
<herron> |
rolling restart of elk5 logstashes |
[production] |
15:10 |
<marostegui> |
Start mysql on db1106 after PDU maintenance is done |
[production] |
15:03 |
<jayme@cumin1001> |
conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.* |
[production] |
15:03 |
<jayme@cumin1001> |
conftool action : set/pooled=inactive; selector: name=kubernetes1004.* |
[production] |
15:03 |
<XioNoX> |
request virtual-chassis vc-port set pic-slot 1 member 4 port 0 |
[production] |
15:03 |
<XioNoX> |
request virtual-chassis vc-port set pic-slot 0 member 2 port 50 |
[production] |
15:02 |
<XioNoX> |
request virtual-chassis vc-port set pic-slot 1 member 1 port 1 |
[production] |
14:53 |
<marostegui> |
Reload dbproxy1016 to recover the alert |
[production] |
14:45 |
<jynus> |
restarting bacula-dir @ backup1001 |
[production] |
14:44 |
<XioNoX> |
reboot asw2-d3-eqiad |
[production] |
14:33 |
<moritzm> |
bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts |
[production] |
14:31 |
<volans> |
restarted ssh on mc1033 from console |
[production] |
14:16 |
<XioNoX> |
request virtual-chassis vc-port delete pic-slot 1 member 4 port 0 |
[production] |
14:16 |
<XioNoX> |
request virtual-chassis vc-port delete pic-slot 0 member 2 port 50 |
[production] |
14:14 |
<XioNoX> |
request virtual-chassis vc-port delete pic-slot 1 member 1 port 1 |
[production] |
14:13 |
<akosiaris> |
drain kubernetes1013, kubernetes1004. They are on row D |
[production] |
14:13 |
<bblack> |
dns1002 - disable puppet + bird service (stop advertising recdns from row D) |
[production] |
14:03 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:03 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:59 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet |
[production] |
13:59 |
<bblack> |
depooling cp1087-1090 |
[production] |
13:59 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet |
[production] |
13:57 |
<XioNoX> |
asw2-d-eqiad> request system reboot member 3 |
[production] |
13:35 |
<cmjohnson1> |
the power cable was not properly seated and lost power to asw2-d3-eqiad |
[production] |
13:34 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) |
[production] |
13:30 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
13:28 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' . |
[production] |
13:28 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' . |
[production] |
13:26 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
13:26 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
13:25 |
<mateusbs17> |
Restarted puppetdb on deployment-puppetdb03 (T248041) |
[production] |
13:24 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . |
[production] |
13:24 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . |
[production] |
13:21 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' . |
[production] |
13:21 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' . |
[production] |
13:21 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . |
[production] |
13:21 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . |
[production] |