2020-07-01 ยง
09:15 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:15 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:08 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:08 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:53 <jayme> draining kubernetes staging node kubestage1001.eqiad.wmnet - T256786 [production]
08:52 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:52 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:44 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:44 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:29 <XioNoX> disable BGP to nfacct in eqiad - T256790 [production]
08:23 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:23 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:08 <jayme@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
08:05 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:05 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:01 <vgutierrez> rolling restart of esams cache nodes to catch up on kernel upgrades [production]
07:42 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
07:42 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
07:40 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
07:40 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime [production]
07:39 <ema> cp2041: restart purged, varnishkafka after librdkafka1 upgrade to 0.11.6-1.1wmf1 T256444 [production]
05:47 <_joe_> restarting nfacctd on netflow1001, it's segfaulting [production]
04:01 <krinkle@deploy1001> Synchronized php-1.35.0-wmf.39/maintenance/findBadBlobs.php: I47c11190b665 (duration: 01m 08s) [production]
00:14 <krinkle@deploy1001> Synchronized private/PrivateSettings.php: T254795 - Set $wmgXhguiDBuser and $wmgXhguiDBpasswor (duration: 01m 06s) [production]
2020-06-30 ยง
21:48 <crusnov@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
21:46 <crusnov@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
21:45 <crusnov@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
21:43 <crusnov@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
21:42 <crusnov@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
21:40 <crusnov@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
21:40 <crusnov@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
21:38 <crusnov@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
21:38 <crusnov@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) [production]
21:38 <crusnov@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
19:19 <hashar@deploy1001> rebuilt and synchronized wikiversions files: group 0 wikis to 1.35.0-wmf.39 # T254176 [production]
18:31 <cdanis> T256790 โœ”๏ธ cdanis@netflow2001.codfw.wmnet ~ ๐Ÿ•โ˜• sudo apt install valgrind [production]
18:27 <tgr> Morning deploys done [production]
18:23 <tgr@deploy1001> Synchronized php-1.35.0-wmf.39/extensions/ElectronPdfService/src/ElectronPdfServiceHooks.php: Backport: [[gerrit:608485|Hotfix: "Undefined index: print" (T256761)]] (duration: 01m 05s) [production]
18:11 <shdubsh> restart varnishmtail,atsmtail,ncredirmtail on ncredir,cp hosts in codfw and eqsin [production]
18:05 <cdanis> installing libc6-dbg on netflow2001 T256790 [production]
17:40 <mdholloway> mobileapps deployments on k8s failing with timeouts; filed T256786 [production]
17:37 <cdanis> โœ”๏ธ cdanis@netflow2001.codfw.wmnet ~ ๐Ÿ•œโ˜• sudo systemctl restart nfacctd [production]
17:33 <mholloway-shell@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
17:18 <mholloway-shell@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
17:17 <papaul> uplugging msw-c3 power to relocate port on PDU [production]
17:09 <mholloway-shell@deploy1001> Finished deploy [mobileapps/deploy@f9df1af]: Update mobileapps to 5c7611b9 (duration: 03m 33s) [production]
17:05 <mholloway-shell@deploy1001> Started deploy [mobileapps/deploy@f9df1af]: Update mobileapps to 5c7611b9 [production]
16:57 <cdanis> T256444 restarted purged on cp2030 and repooling [production]
16:48 <cdanis> T256444 โœ”๏ธ cdanis@cp2030.codfw.wmnet ~ ๐Ÿ•โ˜• sudo depool [production]
15:54 <otto@deploy1001> Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 3 (duration: 00m 03s) [production]