2024-11-14
ยง
|
16:33 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad |
[production] |
16:33 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad |
[production] |
16:33 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json |
[production] |
16:31 |
<klausman@deploy2002> |
helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:31 |
<klausman@deploy2002> |
helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
16:29 |
<dcaro@cloudcumin1001> |
START - Cookbook wmcs.toolforge.component.deploy for component builds-cli |
[toolsbeta] |
16:28 |
<dcaro@cloudcumin1001> |
END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice |
[toolsbeta] |
16:28 |
<dcaro@cloudcumin1001> |
START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice |
[toolsbeta] |
16:28 |
<dancy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1088649 |
[releng] |
16:18 |
<cgoubert@cumin1002> |
START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye |
[production] |
16:04 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575 |
[production] |
16:03 |
<cmooney@cumin1002> |
START - Cookbook sre.network.peering with action 'configure' for AS: 151575 |
[production] |
16:01 |
<papaul> |
ongoing maintenance on cr1-eqiad |
[production] |
16:00 |
<jhancock@cumin2002> |
START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED |
[production] |
15:57 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:57 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:56 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging |
[production] |
15:56 |
<sukhe@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging |
[production] |
15:55 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:55 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:49 |
<moritzm> |
installing nss security updates |
[production] |
15:47 |
<reedy@deploy2002> |
Synchronized wmf-config/CommonSettings.php: T379834 (duration: 08m 02s) |
[production] |
15:47 |
<sukhe@puppetserver1001> |
conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet |
[production] |
15:47 |
<sukhe@cumin1002> |
END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1 |
[production] |
15:45 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.network.cf (exit_code=0) |
[production] |
15:43 |
<pt1979@cumin2002> |
START - Cookbook sre.network.cf |
[production] |
15:42 |
<sukhe@cumin1002> |
START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1 |
[production] |
15:40 |
<stevemunene@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye |
[production] |
15:39 |
<stevemunene@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye |
[production] |
15:37 |
<volans> |
installed spicerack v8.16.1 to cumin hosts |
[production] |
15:36 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, T364092] |
[production] |
15:36 |
<sukhe@cumin1002> |
START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, T364092] |
[production] |
15:35 |
<ladsgroup@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1091248|Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] (duration: 12m 10s) |
[production] |
15:33 |
<sukhe> |
reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: T379797 |
[production] |
15:30 |
<sukhe@cumin1002> |
START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox |
[production] |
15:29 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719 |
[production] |
15:29 |
<jayme@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719 |
[production] |
15:28 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:28 |
<jayme@cumin2002> |
START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:27 |
<ladsgroup@deploy2002> |
ladsgroup: Continuing with sync |
[production] |
15:27 |
<ladsgroup@deploy2002> |
ladsgroup: Backport for [[gerrit:1091248|Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
15:24 |
<elukey@cumin1002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART |
[production] |
15:24 |
<elukey@cumin1002> |
START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART |
[production] |
15:24 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox |
[production] |
15:23 |
<ladsgroup@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1091248|Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] |
[production] |
15:16 |
<brouberol@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply |
[production] |