2022-10-27
ยง
|
15:59 |
<aikochou@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . |
[production] |
15:59 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance |
[production] |
15:59 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36861 and previous config saved to /var/cache/conftool/dbconfig/20221027-155902-ladsgroup.json |
[production] |
15:55 |
<aborrero@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2003-dev.codfw.wmnet with OS bullseye |
[production] |
15:47 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
15:46 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
15:45 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS buster |
[production] |
15:43 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36860 and previous config saved to /var/cache/conftool/dbconfig/20221027-154356-ladsgroup.json |
[production] |
15:42 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS buster |
[production] |
15:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36859 and previous config saved to /var/cache/conftool/dbconfig/20221027-153143-ladsgroup.json |
[production] |
15:31 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance |
[production] |
15:31 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance |
[production] |
15:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36858 and previous config saved to /var/cache/conftool/dbconfig/20221027-153121-ladsgroup.json |
[production] |
15:28 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36857 and previous config saved to /var/cache/conftool/dbconfig/20221027-152849-ladsgroup.json |
[production] |
15:26 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2002.codfw.wmnet |
[production] |
15:26 |
<bking@cumin2002> |
START - Cookbook sre.hosts.remove-downtime for wcqs2002.codfw.wmnet |
[production] |
15:26 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload |
[production] |
15:26 |
<bking@cumin2002> |
START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload |
[production] |
15:26 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload |
[production] |
15:25 |
<bking@cumin2002> |
START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload |
[production] |
15:23 |
<claime> |
Removed silence ProbeDown instance="mwdebug:4444" |
[production] |
15:23 |
<claime> |
k8s-experimental mwdebug service switched to new deployment mw-debug |
[production] |
15:22 |
<claime> |
Unpausing mwdebug k8s deployments |
[production] |
15:19 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage |
[production] |
15:19 |
<cgoubert@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply |
[production] |
15:18 |
<cgoubert@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mw-debug: apply |
[production] |
15:18 |
<cgoubert@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mw-debug: apply |
[production] |
15:18 |
<cgoubert@deploy1002> |
helmfile [codfw] START helmfile.d/services/mw-debug: apply |
[production] |
15:16 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36856 and previous config saved to /var/cache/conftool/dbconfig/20221027-151615-ladsgroup.json |
[production] |
15:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36855 and previous config saved to /var/cache/conftool/dbconfig/20221027-151604-marostegui.json |
[production] |
15:15 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage |
[production] |
15:13 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36854 and previous config saved to /var/cache/conftool/dbconfig/20221027-151343-ladsgroup.json |
[production] |
15:12 |
<claime> |
Silence ProbeDown instance="mwdebug:4444" for 1h |
[production] |
15:11 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36853 and previous config saved to /var/cache/conftool/dbconfig/20221027-151133-ladsgroup.json |
[production] |
15:11 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance |
[production] |
15:11 |
<aborrero@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2001-dev.codfw.wmnet with OS bullseye |
[production] |
15:11 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance |
[production] |
15:11 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36852 and previous config saved to /var/cache/conftool/dbconfig/20221027-151111-ladsgroup.json |
[production] |
15:07 |
<claime> |
Pausing mwdebug k8s deployments |
[production] |
15:07 |
<moritzm> |
installing node-moment security updates |
[production] |
15:07 |
<claime> |
Switching k8s-experimental mwdebug service |
[production] |
15:01 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36851 and previous config saved to /var/cache/conftool/dbconfig/20221027-150108-ladsgroup.json |
[production] |
15:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36850 and previous config saved to /var/cache/conftool/dbconfig/20221027-150058-marostegui.json |
[production] |
14:56 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36849 and previous config saved to /var/cache/conftool/dbconfig/20221027-145604-ladsgroup.json |
[production] |
14:51 |
<moritzm> |
installing krb5 bugfix updates from Bullseye point release |
[production] |
14:50 |
<aborrero@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage |
[production] |
14:50 |
<aborrero@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage |
[production] |
14:49 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS buster |
[production] |
14:48 |
<moritzm> |
installing twitter-bootstrap4 security updates |
[production] |
14:46 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36848 and previous config saved to /var/cache/conftool/dbconfig/20221027-144602-ladsgroup.json |
[production] |