351-400 of 10000 results (26ms)
2021-08-04 §
17:15 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE [production]
17:13 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE [production]
17:12 <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 66c2c7593322dfc575edc818aaff8d9b79466bdd: updateMenteeData: Output how long the script took (T287964) (duration: 01m 07s) [production]
17:11 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE [production]
17:11 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE [production]
17:10 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
17:10 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
17:09 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE [production]
17:08 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE [production]
16:57 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:55 <mutante> mw2351, mw2353, mw2355 - scap pull [production]
16:41 <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/710066 # T288111 [releng]
16:40 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:37 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:25 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage [production]
16:25 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage [production]
16:23 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE [production]
16:23 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage [production]
16:22 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage [production]
16:22 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage [production]
16:22 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage [production]
16:21 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage [production]
16:21 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage [production]
16:21 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE [production]
16:21 <joe> find . -type f -delete on /var/cache/nginx-docker-registry on registry2*, the disk is too small for unbound cache *and* accepting large uploads [production]
16:20 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE [production]
16:19 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE [production]
16:18 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE [production]
16:16 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE [production]
16:15 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009 [production]
16:15 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009 [production]
16:14 <hnowlan> draining maps1008 from cassandra cluster [production]
16:13 <hnowlan@puppetmaster1001> conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet [production]
16:11 <dcaro> rebooted the VM and it's back up, with prompt on virsh, and reachable through ssh, CristianCantoro can you try and confirm?(T288069) [wikicommunityhealth]
16:06 <dcaro> rebuilt backend instance without the attached volume, and the instance is up and reachable, will try with the volume (T288069) [wikicommunityhealth]
16:02 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage [production]
16:02 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage [production]
16:01 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2380.codfw.wmnet with reason: reimage [production]
16:01 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw2380.codfw.wmnet with reason: reimage [production]
16:01 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage [production]
16:01 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage [production]
16:00 <dcaro> rebuilding backend instance to debug initialization process (T288069) [wikicommunityhealth]
15:58 <mutante> mw2351, mw2353, mw2355, mw2357 - converting from appserver to jobrunner, mw2377, mw2378, mw2379, mw2380 - converting from jobrunner to appserver - for balancing of server types over rows [production]
15:51 <dzahn@cumin1001> conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet [production]
15:50 <dzahn@cumin1001> conftool action : set/pooled=inactive; selector: name=mw237[789].codfw.wmnet [production]
15:48 <dzahn@cumin1001> conftool action : set/pooled=inactive; selector: name=mw235[1357].codfw.wmnet [production]
15:47 <dzahn@cumin1001> conftool action : set/pooled=inactive; selector: name=mw235[1357].wmnet [production]
15:13 <thcipriani> puppet fixed on deployment-deploy{01,03} [releng]
15:08 <thcipriani> rebase deployment-puppetmaster04:labs/private causing deployment-deploy{01,03} failure for...¯\_(ツ)_/¯ [releng]
14:30 <godog> upgrade prometheus on cloudmetrics hosts - T222113 [production]