251-300 of 10000 results (29ms)
2021-02-11 ยง
19:40 <mutante> mw1368 - had the reboot via IPMI issue, did DRAC reset and repeated wmf-autoreimage, issue did not happen again [production]
19:40 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1368.eqiad.wmnet [production]
19:39 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE [production]
19:32 <urbanecm@deploy1001> Synchronized wmf-config/logos.php: noop: a1244df3e829abc793113a7e32d1972db9f780a8: Add inline documentation to configuration about updating logos regarding labs (duration: 01m 08s) [production]
19:24 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE [production]
19:24 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 93e168cb7788c772895b47f239275544fb745358: Added Kokebok namespace to nowikibooks (T274265) (duration: 01m 20s) [production]
19:23 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE [production]
19:22 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE [production]
19:20 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE [production]
19:20 <robh@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
19:15 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet [production]
19:13 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
19:13 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1363.eqiad.wmnet [production]
19:13 <robh@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
19:12 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
19:10 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
19:04 <mutante> mw1363 - powercycled, reboot issue [production]
18:56 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1374.eqiad.wmnet [production]
18:48 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1374.eqiad.wmnet [production]
18:46 <mutante> mw1368 - racadm racreset [production]
18:46 <mutante> mw1368 - reboot via IPMI issue & can't powercycle "Unable to perform requested operation." - racreet [production]
18:43 <mutante> mw1374 - powercycled, reboot via ipmi issue [production]
18:19 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
18:18 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:11 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
18:00 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
17:59 <bblack> lvs2007 - downtimes ended, back in service - T274571 [production]
17:58 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE [production]
17:57 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
17:56 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE [production]
17:56 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE [production]
17:54 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE [production]
17:52 <bblack> lvs2007 - starting up puppet + pybal - T274571 [production]
17:36 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1375.eqiad.wmnet [production]
17:35 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1375.eqiad.wmnet [production]
17:32 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet [production]
17:31 <bblack> lvs2007 - shutting down host - T274571 [production]
17:27 <bblack> lvs2007 - stopping pybal - T274571 [production]
17:26 <bblack> lvs2007 - puppet disabled, downtimed in icinga - T274571 [production]
17:20 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:11 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet [production]
17:09 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:07 <mutante> mw1375 - powercycle - stuck at reboot [production]
17:03 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet [production]
16:39 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged [production]
16:39 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged [production]
16:38 <mutante> mw1368 - File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 637, in _execute raise RemoteExecutionError(ret, 'Cumin execution failed') [production]
16:33 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
16:32 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE [production]
16:30 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE [production]