2021-10-25
ยง
|
19:42 |
<mutante> |
icinga - ACKing all unhandled CRIT alerts on hosts with "dev" or "test" in their name, regardless of notifications being disabled or not. just so that we get more signal than noise in actual unhandled CRITs in web UI |
[production] |
19:40 |
<mutante> |
cumin2002 - sudo systemctl reset-failed to clear Icinga alert about failed but (now) non-existing service database-backups-snapshots.service, assuming it's a case of "only in active DC" |
[production] |
19:12 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail |
[production] |
19:12 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail |
[production] |
19:07 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Temporarily move mw groups to db1123 T294295', diff saved to https://phabricator.wikimedia.org/P17597 and previous config saved to /var/cache/conftool/dbconfig/20211025-190717-kormat.json |
[production] |
19:06 |
<mutante> |
db1112 - powercycling |
[production] |
19:04 |
<legoktm@cumin1001> |
dbctl commit (dc=all): 'Depool db1112 (T294295)', diff saved to https://phabricator.wikimedia.org/P17596 and previous config saved to /var/cache/conftool/dbconfig/20211025-190436-legoktm.json |
[production] |
18:41 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:40 |
<jforrester@deploy1002> |
Synchronized php-1.38.0-wmf.5/extensions/timeline/includes/Timeline.php: Backport: [[gerrit:734312|Input may be null when rendering a self-closing tag `<timeline />` (T294020)]] (duration: 00m 55s) |
[production] |
18:38 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:28 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:25 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:24 |
<jforrester@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732971|Fix some easy codestyle issues]] (duration: 00m 55s) |
[production] |
18:22 |
<jforrester@deploy1002> |
Synchronized w/static.php: Config: [[gerrit:732971|Fix some easy codestyle issues]] (duration: 00m 54s) |
[production] |
18:19 |
<jforrester@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732840|Fix array declaration of NS_USER_TALK abbreviation on ruwikiquote (T197058)]] (duration: 00m 55s) |
[production] |
18:16 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:15 |
<jforrester@deploy1002> |
Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:732836|flaggedrevs: Drop legacy wgFlaggedRevsStatsAge config, no longer read]] (duration: 00m 55s) |
[production] |
18:13 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:11 |
<jforrester@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732254|Make reply tool available as opt-out on frwiki (T293687)]] (duration: 00m 56s) |
[production] |
17:41 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet |
[production] |
17:40 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet |
[production] |
17:39 |
<mutante> |
mw2253 - scap pull after hw maintenance is over |
[production] |
17:32 |
<bd808@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
17:26 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
17:24 |
<mmandere@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
17:23 |
<bd808@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
17:22 |
<XioNoX> |
update core routers ACLs |
[production] |
17:20 |
<mmandere@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
16:49 |
<XioNoX> |
update management routers ACLs |
[production] |
16:36 |
<XioNoX> |
DNS - Add eqsin-ulsfo transport v6 prefix - T273308 |
[production] |
16:31 |
<mmandere@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:28 |
<mmandere@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
16:25 |
<accraze@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
16:25 |
<mmandere@cumin2002> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
16:21 |
<mmandere@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
16:12 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:10 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=mw2253.codfw.wmnet |
[production] |
16:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:08 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:734298|Empty wikibase disabled access entity types on Beta (T294159)]] (beta-only) (duration: 01m 47s) |
[production] |
16:04 |
<mmandere@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:01 |
<mmandere@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
15:57 |
<jdrewniak@deploy1002> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328| Bumping portals to master (T128546)]] (duration: 01m 52s) |
[production] |
15:55 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:52 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:49 |
<jdrewniak@deploy1002> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328| Bumping portals to master (T128546)]] (duration: 01m 54s) |
[production] |
15:46 |
<jbond> |
upgrade cas/idp to 6.4.2 |
[production] |
14:56 |
<mutante> |
mw2253 - shut down and downtimed for 2 days |
[production] |
14:50 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade |
[production] |
14:50 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade |
[production] |
14:49 |
<mutante> |
depooling mw2253 for DRAC upgrade (T283582) |
[production] |