6001-6050 of 10000 results (73ms)
2020-02-05 ยง
15:54 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:52 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
15:29 <effie> restart php-fpm on canaries - T236800 [production]
15:24 <effie> Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - T236800 [production]
15:15 <vgutierrez> depooling & reimaging cp5012 as buster - T242093 [production]
15:12 <ema> cp: unset Accept-Encoding from ats-be requests to applayer T242478 [production]
14:34 <vgutierrez> updating acme-chief to version 0.24 - T244236 [production]
14:32 <_joe_> restarting mcrouter at nice -19 on mw1331 for testing effects of that change [production]
14:30 <vgutierrez> upload acme-chief 0.24 to apt.wm.o (buster) - T244236 [production]
14:26 <XioNoX> push inital flowspec config to all routers [production]
14:23 <vgutierrez> pooling cp5006 - T242093 [production]
14:13 <ema> cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues T242478 [production]
13:46 <marostegui> Decrease buffer pool size on db1107 for testing - T242702 [production]
13:45 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:43 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:42 <akosiaris> undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm [production]
13:41 <ema> cp1075: unset Accept-Encoding on origin server requests T242478 [production]
13:39 <Amir1> EU SWAT is done [production]
13:38 <ema> cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ T242478 [production]
13:35 <XioNoX> rollback traffic steering off cr2-eqord [production]
13:29 <akosiaris> manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency [production]
13:25 <XioNoX> reboot cr2-eqord for software upgrade - yaaaaa [production]
13:24 <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301|Cache PropertyInfoLookup internally]] (T243955) (duration: 01m 07s) [production]
13:17 <XioNoX> increase ospf cost for cr2-eqord links [production]
13:16 <vgutierrez> upload acme-chief 0.23 to apt.wm.o (buster) - T244236 [production]
13:15 <XioNoX> disable transit/peering BGP sessions on cr2-eqord [production]
13:15 <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301|Cache PropertyInfoLookup internally]] (T243955) (duration: 01m 07s) [production]
13:10 <XioNoX> rollback: disable transit/peering BGP sessions on cr2-eqdfw [production]
13:08 <vgutierrez> depooling & reimaging cp5006 as buster - T242093 [production]
13:03 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 06s) [production]
13:01 <XioNoX> reboot cr2-eqdfw for software upgrade [production]
13:00 <Amir1> SWAT needs more time [production]
12:55 <XioNoX> disable transit/peering BGP sessions on cr2-eqdfw [production]
12:50 <urbanecm@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: d450288: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s) [production]
12:48 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s) [production]
12:32 <awight@deploy1001> Synchronized php-1.35.0-wmf.18/extensions/Cite: SWAT: [[gerrit:570285|Revert follow standardization (T240858)]] (duration: 01m 13s) [production]
10:53 <akosiaris> rolling restart of all pods on kubernetes staging cluster to make sure everything is fine after the upgrade [production]
10:50 <akosiaris> T244335 upgrade kubernetes-node on kubestage1002.eqiad.wmnet to 1.13.12 [production]
10:43 <ema> cp4028: varnish-frontend-restart T243634 [production]
10:24 <akosiaris> T244335 upgrade kubernetes-master on neon.eqiad.wmnet (staging) [production]
10:24 <effie> Upload php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 - T236800 [production]
10:10 <Urbanecm> Run mwscript deleteEqualMessages.php --delete to delete GrowthExperiments' message overrides (cswiki, viwiki, arwiki, kowiki) [production]
09:57 <akosiaris> upload kubernetes 1.13.12 to apt.wikimedia.org stretch-wikimedia/main T244335 [production]
09:51 <effie> install libmemcached-tools on mc-gp* servers - T240684 [production]
09:05 <ema> add individual FortiGate IPs hitting ulsfo (currently cp4028) to vcl blocked_nets -- trying to identify problematic traffic T243634 [production]
07:02 <marostegui> Replay s1 traffic on db1107 (10.4) T242702 [production]
06:32 <elukey> force a puppet run on ores* hosts [production]
06:12 <marostegui> Remove partitions from revision table db1098:3317 - T239453 [production]
06:09 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1098:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10312 and previous config saved to /var/cache/conftool/dbconfig/20200205-060942-marostegui.json [production]
06:09 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db2085:3311, db2086:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10311 and previous config saved to /var/cache/conftool/dbconfig/20200205-060911-marostegui.json [production]