2023-06-13
ยง
|
13:01 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49421 and previous config saved to /var/cache/conftool/dbconfig/20230613-130129-ladsgroup.json |
[production] |
13:01 |
<moritzm> |
installing nbconvert security updates |
[production] |
12:55 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance |
[production] |
12:55 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance |
[production] |
12:51 |
<fabfur> |
reboot cp4042 and cp4050 for kernel upgrade (T335835) |
[production] |
12:51 |
<fabfur@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cp4042.ulsfo.wmnet |
[production] |
12:51 |
<fabfur@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet |
[production] |
12:46 |
<akosiaris@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:46 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49420 and previous config saved to /var/cache/conftool/dbconfig/20230613-124623-ladsgroup.json |
[production] |
12:45 |
<akosiaris@deploy1002> |
helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:45 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance |
[production] |
12:45 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance |
[production] |
12:45 |
<akosiaris@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:44 |
<akosiaris@deploy1002> |
helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:44 |
<akosiaris@deploy1002> |
helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:44 |
<akosiaris@deploy1002> |
helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply |
[production] |
12:35 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance |
[production] |
12:35 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance |
[production] |
12:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49419 and previous config saved to /var/cache/conftool/dbconfig/20230613-123117-ladsgroup.json |
[production] |
12:29 |
<fabfur@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4049.ulsfo.wmnet |
[production] |
12:28 |
<fabfur@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4041.ulsfo.wmnet |
[production] |
12:26 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance |
[production] |
12:25 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance |
[production] |
12:18 |
<fabfur> |
reboot cp4041 and cp4049 for kernel upgrade (T335835) |
[production] |
12:18 |
<fabfur@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cp4041.ulsfo.wmnet |
[production] |
12:18 |
<fabfur@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet |
[production] |
12:16 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49418 and previous config saved to /var/cache/conftool/dbconfig/20230613-121611-ladsgroup.json |
[production] |
12:15 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
12:15 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
12:15 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance |
[production] |
12:15 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance |
[production] |
12:09 |
<hashar> |
Restarted Zuul CI due to T309376 |
[production] |
12:06 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance |
[production] |
12:05 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance |
[production] |
11:56 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance |
[production] |
11:56 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance |
[production] |
11:46 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance |
[production] |
11:46 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance |
[production] |
11:45 |
<Amir1> |
cat wikis_having_stubs | xargs -I {} bash -c 'echo {}; touch /home/ladsgroup/{}.undo.sql; chmod 777 /home/ladsgroup/{}.undo.sql; mwscript maintenance/storage/moveToExternal.php --wiki={} --end 200000000 --undo /home/ladsgroup/{}.undo.sql DB cluster26' (T299387) |
[production] |
11:43 |
<fabfur@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet |
[production] |
11:42 |
<fabfur@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4040.ulsfo.wmnet |
[production] |
11:41 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049) |
[production] |
11:40 |
<hnowlan@cumin1001> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049) |
[production] |
11:37 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049) |
[production] |
11:37 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance |
[production] |
11:37 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance |
[production] |
11:36 |
<ladsgroup@deploy1002> |
Finished scap: Backport for [[gerrit:929648|moveToExternal: Also check for utf8 encoding before trying to convert]] (duration: 09m 59s) |
[production] |
11:35 |
<hnowlan@cumin1001> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049) |
[production] |
11:32 |
<fabfur> |
reboot cp4040 and cp4048 for kernel upgrade (T335835) |
[production] |
11:32 |
<fabfur@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cp4040.ulsfo.wmnet |
[production] |