1001-1050 of 10000 results (32ms)
2021-02-11 ยง
19:15 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet [production]
19:13 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
19:13 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1363.eqiad.wmnet [production]
19:13 <robh@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
19:12 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
19:10 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
19:04 <mutante> mw1363 - powercycled, reboot issue [production]
18:56 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1374.eqiad.wmnet [production]
18:48 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1374.eqiad.wmnet [production]
18:46 <mutante> mw1368 - racadm racreset [production]
18:46 <mutante> mw1368 - reboot via IPMI issue & can't powercycle "Unable to perform requested operation." - racreet [production]
18:43 <mutante> mw1374 - powercycled, reboot via ipmi issue [production]
18:19 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
18:18 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:11 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
18:00 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
17:59 <bblack> lvs2007 - downtimes ended, back in service - T274571 [production]
17:58 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE [production]
17:57 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
17:56 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE [production]
17:56 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE [production]
17:54 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE [production]
17:52 <bblack> lvs2007 - starting up puppet + pybal - T274571 [production]
17:36 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1375.eqiad.wmnet [production]
17:35 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1375.eqiad.wmnet [production]
17:32 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet [production]
17:31 <bblack> lvs2007 - shutting down host - T274571 [production]
17:27 <bblack> lvs2007 - stopping pybal - T274571 [production]
17:26 <bblack> lvs2007 - puppet disabled, downtimed in icinga - T274571 [production]
17:20 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:11 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet [production]
17:09 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:07 <mutante> mw1375 - powercycle - stuck at reboot [production]
17:03 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet [production]
16:39 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged [production]
16:39 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged [production]
16:38 <mutante> mw1368 - File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 637, in _execute raise RemoteExecutionError(ret, 'Cumin execution failed') [production]
16:33 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
16:32 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE [production]
16:30 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE [production]
16:30 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE [production]
16:28 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE [production]
16:28 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE [production]
16:24 <ejegg> updated payments-wiki from a232fc3438 to 4b7b195c8a [production]
16:13 <kormat@cumin1001> dbctl commit (dc=all): 'Pool db1163 at 1%, again T258361', diff saved to https://phabricator.wikimedia.org/P14323 and previous config saved to /var/cache/conftool/dbconfig/20210211-161308-kormat.json [production]
15:52 <jynus> deploying fixed grants to db1163 [production]
15:50 <gehel> ban elastic2054 from shard allocation - T274555 [production]
15:49 <jynus@cumin1001> dbctl commit (dc=all): 'Depool 1163', diff saved to https://phabricator.wikimedia.org/P14321 and previous config saved to /var/cache/conftool/dbconfig/20210211-154902-jynus.json [production]
15:47 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org [production]
15:46 <gehel> depooling elastic2054 - T274555 [production]