8151-8200 of 10000 results (100ms)
2023-10-26 ยง
13:21 <jmm@cumin2002> START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw [production]
13:20 <lucaswerkmeister-wmde@deploy2002> Finished scap: Backport for [[gerrit:968713|Enable block feature for AbuseFilter on srwiki (T349727)]] (duration: 10m 23s) [production]
13:20 <bking@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
13:20 <bking@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [production]
13:15 <lucaswerkmeister-wmde@deploy2002> zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync [production]
13:15 <moritzm> installing poppler security updates [production]
13:11 <lucaswerkmeister-wmde@deploy2002> zoranzoki21 and lucaswerkmeister-wmde: Backport for [[gerrit:968713|Enable block feature for AbuseFilter on srwiki (T349727)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:10 <lucaswerkmeister-wmde@deploy2002> Started scap: Backport for [[gerrit:968713|Enable block feature for AbuseFilter on srwiki (T349727)]] [production]
13:04 <bking@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
12:27 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance [production]
12:26 <stevemunene@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance [production]
11:04 <kevinbazira@deploy2002> helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
11:03 <kevinbazira@deploy2002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
10:58 <kevinbazira@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
10:51 <isaranto@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
10:51 <isaranto@deploy2002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
10:51 <isaranto@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
10:40 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
10:30 <mvolz@deploy2002> helmfile [eqiad] DONE helmfile.d/services/citoid: apply [production]
10:29 <mvolz@deploy2002> helmfile [eqiad] START helmfile.d/services/citoid: apply [production]
10:25 <mvolz@deploy2002> helmfile [codfw] DONE helmfile.d/services/citoid: apply [production]
10:25 <mvolz@deploy2002> helmfile [codfw] START helmfile.d/services/citoid: apply [production]
10:20 <mvolz@deploy2002> helmfile [staging] DONE helmfile.d/services/citoid: apply [production]
10:20 <mvolz@deploy2002> helmfile [staging] START helmfile.d/services/citoid: apply [production]
10:10 <mvolz@deploy2002> helmfile [staging] DONE helmfile.d/services/citoid: apply [production]
10:10 <mvolz@deploy2002> helmfile [staging] START helmfile.d/services/citoid: apply [production]
09:29 <dcausse> erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12) [production]
09:28 <dcausse> depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12) [production]
09:23 <brouberol@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye [production]
09:14 <ayounsi@cumin1001> END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox [production]
09:14 <ayounsi@cumin1001> START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox [production]
09:06 <brouberol@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage [production]
09:03 <brouberol@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage [production]
08:50 <brouberol@cumin1001> START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye [production]
08:49 <urbanecm> mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias) [production]
08:48 <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] (duration: 09m 29s) [production]
08:43 <urbanecm@deploy2002> urbanecm: Continuing with sync [production]
08:40 <urbanecm@deploy2002> urbanecm: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
08:40 <kevinbazira@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
08:40 <brouberol@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye [production]
08:39 <urbanecm@deploy2002> Started scap: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] [production]
08:32 <kevinbazira@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
08:32 <urbanecm@deploy2002> helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply [production]
08:31 <urbanecm@deploy2002> helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply [production]
08:29 <urbanecm@deploy2002> helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply [production]
08:28 <urbanecm@deploy2002> helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply [production]
08:28 <urbanecm@deploy2002> helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply [production]
08:27 <urbanecm@deploy2002> helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply [production]
08:24 <brouberol@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage [production]
08:21 <brouberol@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage [production]