2023-10-26
ยง
|
10:18 |
<stevemunene> |
restart zookeper leader to pick up new host druid1011 T336042 |
[analytics] |
10:10 |
<mvolz@deploy2002> |
helmfile [staging] DONE helmfile.d/services/citoid: apply |
[production] |
10:10 |
<mvolz@deploy2002> |
helmfile [staging] START helmfile.d/services/citoid: apply |
[production] |
09:46 |
<wm-bot2> |
dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node |
[admin] |
09:29 |
<dcausse> |
erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12) |
[production] |
09:28 |
<dcausse> |
depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12) |
[production] |
09:23 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye |
[production] |
09:18 |
<stevemunene> |
stop zookeper on druid1006 T336042 |
[analytics] |
09:16 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics |
[toolsbeta] |
09:16 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics |
[toolsbeta] |
09:14 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox |
[production] |
09:14 |
<ayounsi@cumin1001> |
START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox |
[production] |
09:10 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics |
[toolsbeta] |
09:09 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics |
[toolsbeta] |
09:06 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage |
[production] |
09:03 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage |
[production] |
08:50 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye |
[production] |
08:49 |
<urbanecm> |
mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias) |
[production] |
08:48 |
<brouberol> |
sudo cookbook sre.hosts.reimage --os bullseye -t T348495 kafka-jumbo1009 |
[analytics] |
08:48 |
<urbanecm@deploy2002> |
Finished scap: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] (duration: 09m 29s) |
[production] |
08:43 |
<urbanecm@deploy2002> |
urbanecm: Continuing with sync |
[production] |
08:40 |
<urbanecm@deploy2002> |
urbanecm: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
08:40 |
<kevinbazira@deploy2002> |
helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . |
[production] |
08:40 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye |
[production] |
08:39 |
<urbanecm@deploy2002> |
Started scap: Backport for [[gerrit:949034|Growth: Enable new Impact backend everywhere (T344143)]] |
[production] |
08:32 |
<kevinbazira@deploy2002> |
helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . |
[production] |
08:32 |
<urbanecm@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:31 |
<urbanecm@deploy2002> |
helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:29 |
<urbanecm@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:29 |
<taavi> |
root@tools-sgeweblight-10-21:~# sudo dpkg --configure -a |
[tools] |
08:28 |
<urbanecm@deploy2002> |
helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:28 |
<urbanecm@deploy2002> |
helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:27 |
<urbanecm@deploy2002> |
helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
08:24 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage |
[production] |
08:21 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage |
[production] |
08:18 |
<taavi> |
restart sssd on tools-nfs-2 |
[tools] |
08:07 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-jumbo1008.eqiad.wmnet with OS bullseye |
[production] |
08:06 |
<brouberol> |
sudo cookbook sre.hosts.reimage --os bullseye -t T348495 kafka-jumbo1008 |
[analytics] |
08:02 |
<godog> |
restart prometheus k8s k8s-aux - T343529 |
[production] |
07:55 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133 |
[production] |
07:54 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.peering with action 'configure' for AS: 15133 |
[production] |
07:38 |
<hashar> |
integrastion-castor05: `sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node16-rundoc-docker` # T348243 |
[releng] |
07:36 |
<jelto@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/miscweb: apply |
[production] |
07:32 |
<jelto@deploy2002> |
helmfile [codfw] START helmfile.d/services/miscweb: apply |
[production] |
07:31 |
<jelto@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/miscweb: apply |
[production] |
07:23 |
<jelto@deploy2002> |
helmfile [staging] START helmfile.d/services/miscweb: apply |
[production] |
07:21 |
<apergos> |
UTC morning backport and config window closed |
[production] |
07:19 |
<kartik@deploy2002> |
Finished scap: Backport for [[gerrit:968649|testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267)]] (duration: 13m 11s) |
[production] |
07:13 |
<kartik@deploy2002> |
kartik: Continuing with sync |
[production] |
07:08 |
<kartik@deploy2002> |
kartik: Backport for [[gerrit:968649|testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |