4501-4550 of 10000 results (22ms)
2021-02-16 ยง
22:22 <akosiaris> re-enable puppet and squid on install1003. wdqs seems to be mildly related to the outage, restart it [production]
22:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster [production]
21:45 <akosiaris> stop squid as a stopgap on install1003 and disable puppet so that it is not restarted while we figure out what wdqs updater is doing to cause issue to mediawiki [production]
20:47 <marxarelli> 1.36.0-wmf.31 rolled to group0. no new errors for wmf.31 (T271345) [production]
20:33 <dduvall@deploy1001> rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.31 [production]
20:20 <mutante> mwdebug1002 has been recreated on buster and has been repooled after scap pull - you can find a .tar.gz in your home with the contents of your home before reimaging, fingerprint at T274023#6835116 [production]
20:18 <legoktm@cumin1001> conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet [production]
20:18 <legoktm@cumin1001> conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet [production]
20:18 <legoktm@cumin1001> conftool action : set/pooled=yes; selector: name=mw1289.eqiad.wmnet [production]
20:18 <legoktm@cumin1001> conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet [production]
20:17 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet [production]
20:15 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mwdebug1002.eqiad.wmnet [production]
20:04 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet [production]
20:04 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet [production]
20:04 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet [production]
20:03 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1288.eqiad.wmnet [production]
19:58 <ryankemper> [WDQS] De-pooled `wdqs100[4,7]` to catch up on lag, and pooled `wdqs100[5,6]` [production]
19:09 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade [production]
19:09 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade [production]
19:06 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE [production]
19:04 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE [production]
19:03 <legoktm@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE [production]
19:02 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE [production]
19:01 <legoktm@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE [production]
19:00 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE [production]
18:59 <mutante> puppetmaster1002 - puppet cert clean mwdebug1002.eqiad.wmnet, sign new request, initial puppet run (T274023) [production]
18:59 <legoktm@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE [production]
18:58 <legoktm@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE [production]
18:52 <mutante> re-creating mwdebug1002 [production]
18:49 <dduvall@deploy1001> Finished scap: testwikis wikis to 1.36.0-wmf.31 (duration: 49m 37s) [production]
18:41 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1346.eqiad.wmnet [production]
18:38 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet [production]
18:37 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet [production]
18:35 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1346.eqiad.wmnet [production]
18:33 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1352.eqiad.wmnet [production]
18:32 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet [production]
18:28 <mutante> mw1352 - powercycle via mgmt [production]
18:04 <dduvall@deploy1001> Started scap: testwikis wikis to 1.36.0-wmf.31 [production]
17:41 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE [production]
17:39 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1347.eqiad.wmnet with reason: REIMAGE [production]
17:39 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE [production]
17:37 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1347.eqiad.wmnet with reason: REIMAGE [production]
17:36 <marxarelli> 1.36.0-wmf.31 was branched at c49ac6d2448efa085bdd34fc415aeece05a98dde (T271345) [production]
17:33 <akosiaris@deploy1001> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:32 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
17:31 <akosiaris@deploy1001> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:30 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
17:30 <akosiaris@deploy1001> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:30 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
17:29 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1352.eqiad.wmnet with reason: REIMAGE [production]