4001-4050 of 10000 results (41ms)
2021-08-10 ยง
19:29 <robh@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE [production]
19:27 <robh@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE [production]
19:27 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE [production]
19:25 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE [production]
19:16 <cmjohnson@cumin1001> END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) [production]
19:15 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
19:09 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE [production]
19:09 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE [production]
19:07 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE [production]
19:05 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE [production]
19:04 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE [production]
19:04 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE [production]
19:04 <jhuneidi@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159 [production]
19:04 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE [production]
19:03 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE [production]
18:49 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:47 <ryankemper> [WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag) [production]
18:46 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
18:46 <ryankemper> T288501 (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on) [production]
18:45 <ryankemper> T288501 `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled [production]
18:42 <ryankemper@cumin2001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
17:23 <jhuneidi@deploy1002> Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s) [production]
17:19 <ryankemper> T288501 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001` tmux session `wdqs_data_xfer` [production]
17:19 <ryankemper@cumin2001> START - Cookbook sre.wdqs.data-transfer [production]
17:18 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
17:13 <ryankemper> T288501 [WDQS] `ryankemper@wdqs2003:~$ sudo rm -fv /srv/wdqs/wikidata.jnl` [production]
17:09 <razzi@cumin1001> END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001 [production]
17:09 <razzi@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001 [production]
17:06 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
17:02 <btullis@cumin1001> END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
17:02 <btullis@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
17:01 <mbsantos@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
16:49 <btullis@cumin1001> END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
16:49 <btullis@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
16:47 <jhuneidi@deploy1002> Started scap: testwikis wikis to 1.37.0-wmf.18 [production]
16:36 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81 (duration: 02m 10s) [production]
16:34 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81 [production]
16:33 <btullis@cumin1001> END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
16:33 <btullis@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001 [production]
16:25 <brennen> gitlab: run ansible to apply [[gerrit:710676|fix shell for backup cronjob]] (T288324) [production]
16:01 <moritzm> installing c-ares security updates on buster [production]
14:48 <ladsgroup@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710515|Reduce ten seconds from dispatch max time (T288175)]] (duration: 00m 58s) [production]
13:32 <moritzm> updating bullseye installations to the latest state of testing [production]
13:19 <moritzm> installing perl security updates on Bullseye (older distros not affected) [production]
13:00 <jayme@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
12:54 <ppchelko@deploy1002> Finished deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227 (duration: 37m 18s) [production]
12:42 <lucaswerkmeister-wmde@deploy1002> Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:709504|Remove wmgWBRepoConceptBaseUri (T257260)]] (3/3, test) (duration: 00m 57s) [production]
12:41 <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:709504|Remove wmgWBRepoConceptBaseUri (T257260)]] (2/3, beta) (duration: 00m 57s) [production]
12:39 <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709504|Remove wmgWBRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 00m 57s) [production]
12:36 <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/Wikibase.php: Config: [[gerrit:709503|Stop setting $wgWBRepoSettings['conceptBaseUri'] (T257260)]] (duration: 00m 58s) [production]