101-150 of 10000 results (21ms)
2020-09-09 §
08:44 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:40 <oblivian@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
08:40 <oblivian@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [production]
08:36 <kormat@cumin1001> dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json [production]
08:34 <oblivian@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [production]
08:34 <oblivian@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
08:30 <kormat@cumin1001> dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json [production]
08:30 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:30 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:14 <oblivian@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
07:41 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s) [production]
07:25 <elukey> restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage [production]
06:21 <XioNoX> push new pfw policies - T262297 [production]
01:58 <eileen> civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc [production]
2020-09-08 §
23:47 <eileen> civicrm revision is 4e40a59d42, config revision is d26334fa36 [production]
23:25 <eileen> civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789 [production]
22:14 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
22:12 <andrew@deploy1001> Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s) [production]
22:08 <andrew@deploy1001> Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update [production]
22:02 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
21:57 <andrew@deploy1001> Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s) [production]
21:57 <andrew@deploy1001> Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks [production]
19:19 <jhuneidi@deploy1001> rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8 [production]
19:12 <jhuneidi@deploy1001> Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s) [production]
18:22 <elukey> rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305 [production]
18:00 <jhuneidi@deploy1001> Started scap: testwikis wikis to 1.36.0-wmf.8 [production]
17:58 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
17:57 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) [production]
17:54 <Amir1> Deployed patch for T262240 [production]
17:53 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
17:23 <andrewbogott> rebooting cloudvirt1033 [production]
17:03 <klausman> attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs [production]
16:35 <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s) [production]
16:34 <herron> increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag) [production]
16:12 <longma> 1.36.0-wmf.8 was branched at e81e81e91473cc8259c473165863aca8ecea2784 for T257976 [production]
16:03 <akosiaris@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
16:03 <akosiaris@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
16:02 <akosiaris@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
15:34 <jayme@cumin1001> conftool action : set/pooled=yes; selector: name=kubernetes1004.* [production]
15:32 <jayme@cumin1001> conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.* [production]
15:30 <elukey> roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed [production]
15:26 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) [production]
15:20 <_joe_> restarted celery-ores-worker.service on ores1007 [production]
15:19 <_joe_> restarted ferm on wdqs1011 [production]
15:18 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
15:16 <_joe_> starting wdqs-updater on wdqs1005 [production]
15:15 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet [production]
15:14 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet [production]
15:14 <bblack> repool cp1087-90 (eqiad row D) [production]
15:13 <herron> rolling restart of elk5 logstashes [production]