3151-3200 of 10000 results (29ms)
2021-03-10 §
04:53 <ryankemper> T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'` [production]
04:53 <ryankemper> T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"' [production]
04:52 <ryankemper> T266470 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation [production]
01:08 <krinkle@deploy1002> Synchronized php-1.36.0-wmf.34/extensions/NavigationTiming/modules/ext.navigationTiming.js: T276826 Ibd9ddf14d64 (duration: 01m 14s) [production]
00:02 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE [production]
00:00 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE [production]
2021-03-09 §
23:59 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE [production]
23:58 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE [production]
22:04 <mutante> phab1001 - manually running phab public task dumd script after making changes to redirect stdout [production]
20:42 <elukey> reimaged an-worker1091 to buster [production]
20:41 <bstorm> depooled labsdb1009 T276980 [production]
20:25 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE [production]
20:25 <bstorm> downtimed labsdb1009 so it doesn't keep paging T276980 [production]
20:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE [production]
20:09 <brennen> train status: 1.36.0-wmf.32 (T274938) on group0 at 20:06:32 UTC; logs initially quiet. [production]
20:06 <brennen@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34 [production]
19:05 <brennen@deploy1002> Pruned MediaWiki: 1.36.0-wmf.31 (duration: 03m 34s) [production]
19:04 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:59 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
18:54 <brennen@deploy1002> Finished scap: testwikis wikis to 1.36.0-wmf.34 (duration: 47m 25s) [production]
18:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE [production]
18:49 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE [production]
18:47 <dcausse> re-pool wdqs1004 [production]
18:37 <mbsantos@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:35 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:34 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:29 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
18:26 <elukey> reimage an-worker1087 to buster [production]
18:16 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' . [production]
18:13 <mbsantos@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . [production]
18:12 <brennen@deploy1002> Started scap: testwikis wikis to 1.36.0-wmf.34 [production]
18:10 <mbsantos@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:05 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:03 <mbsantos@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
18:02 <marxarelli> deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968) [production]
18:02 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE [production]
18:00 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE [production]
17:50 <papaul> rebooting db2073 for firmware upgrade [production]
17:01 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE [production]
17:00 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 3119d7a703a38b328fa634db64b2929d54829884: sqwiki: Fix deployment of Growth features (duration: 01m 00s) [production]
16:59 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE [production]
16:46 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:41 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
16:40 <elukey> reimage analytics1077 to buster [production]
16:33 <aborrero@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1027.eqiad.wmnet [production]
16:32 <jayme@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
16:31 <jayme@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
16:31 <brennen> 1.36.0-wmf.34 was branched at e175899921535f83e168145cbe942489475607db for T274938 [production]
16:27 <aborrero@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1027.eqiad.wmnet [production]
16:21 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14708 and previous config saved to /var/cache/conftool/dbconfig/20210309-162116-root.json [production]