2401-2450 of 10000 results (27ms)
2020-09-08 ยง
19:12 <jhuneidi@deploy1001> Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s) [production]
18:22 <elukey> rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305 [production]
18:00 <jhuneidi@deploy1001> Started scap: testwikis wikis to 1.36.0-wmf.8 [production]
17:58 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
17:57 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) [production]
17:54 <Amir1> Deployed patch for T262240 [production]
17:53 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
17:23 <andrewbogott> rebooting cloudvirt1033 [production]
17:03 <klausman> attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs [production]
16:35 <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s) [production]
16:34 <herron> increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag) [production]
16:12 <longma> 1.36.0-wmf.8 was branched at e81e81e91473cc8259c473165863aca8ecea2784 for T257976 [production]
16:03 <akosiaris@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
16:03 <akosiaris@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
16:02 <akosiaris@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
15:34 <jayme@cumin1001> conftool action : set/pooled=yes; selector: name=kubernetes1004.* [production]
15:32 <jayme@cumin1001> conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.* [production]
15:30 <elukey> roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed [production]
15:26 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) [production]
15:20 <_joe_> restarted celery-ores-worker.service on ores1007 [production]
15:19 <_joe_> restarted ferm on wdqs1011 [production]
15:18 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
15:16 <_joe_> starting wdqs-updater on wdqs1005 [production]
15:15 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet [production]
15:14 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet [production]
15:14 <bblack> repool cp1087-90 (eqiad row D) [production]
15:13 <herron> rolling restart of elk5 logstashes [production]
15:10 <marostegui> Start mysql on db1106 after PDU maintenance is done [production]
15:03 <jayme@cumin1001> conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.* [production]
15:03 <jayme@cumin1001> conftool action : set/pooled=inactive; selector: name=kubernetes1004.* [production]
15:03 <XioNoX> request virtual-chassis vc-port set pic-slot 1 member 4 port 0 [production]
15:03 <XioNoX> request virtual-chassis vc-port set pic-slot 0 member 2 port 50 [production]
15:02 <XioNoX> request virtual-chassis vc-port set pic-slot 1 member 1 port 1 [production]
14:53 <marostegui> Reload dbproxy1016 to recover the alert [production]
14:45 <jynus> restarting bacula-dir @ backup1001 [production]
14:44 <XioNoX> reboot asw2-d3-eqiad [production]
14:33 <moritzm> bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts [production]
14:31 <volans> restarted ssh on mc1033 from console [production]
14:16 <XioNoX> request virtual-chassis vc-port delete pic-slot 1 member 4 port 0 [production]
14:16 <XioNoX> request virtual-chassis vc-port delete pic-slot 0 member 2 port 50 [production]
14:14 <XioNoX> request virtual-chassis vc-port delete pic-slot 1 member 1 port 1 [production]
14:13 <akosiaris> drain kubernetes1013, kubernetes1004. They are on row D [production]
14:13 <bblack> dns1002 - disable puppet + bird service (stop advertising recdns from row D) [production]
14:03 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
14:03 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:59 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet [production]
13:59 <bblack> depooling cp1087-1090 [production]
13:59 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet [production]
13:57 <XioNoX> asw2-d-eqiad> request system reboot member 3 [production]
13:35 <cmjohnson1> the power cable was not properly seated and lost power to asw2-d3-eqiad [production]