2024-10-15
ยง
|
11:01 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance |
[production] |
10:57 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json |
[production] |
10:53 |
<tappof> |
expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196 |
[production] |
10:53 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json |
[production] |
10:52 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance |
[production] |
10:52 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance |
[production] |
10:52 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance |
[production] |
10:52 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance |
[production] |
10:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json |
[production] |
10:38 |
<brouberol@cumin1002> |
START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes |
[production] |
10:38 |
<brouberol@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet |
[production] |
10:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json |
[production] |
10:34 |
<brouberol@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet |
[production] |
10:30 |
<brouberol@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet |
[production] |
10:26 |
<brouberol@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet |
[production] |
10:25 |
<brouberol@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet |
[production] |
10:22 |
<brouberol@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet |
[production] |
10:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json |
[production] |
10:21 |
<brouberol@cumin1002> |
END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. |
[production] |
10:14 |
<brouberol@cumin1002> |
START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. |
[production] |
10:11 |
<brouberol@cumin1002> |
END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker |
[production] |
10:06 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json |
[production] |
10:04 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json |
[production] |
10:04 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance |
[production] |
10:04 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance |
[production] |
10:04 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json |
[production] |
09:57 |
<brouberol@cumin1002> |
START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker |
[production] |
09:55 |
<brouberol@cumin1002> |
END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker |
[production] |
09:52 |
<jayme@deploy1003> |
helmfile [codfw] DONE helmfile.d/admin 'apply'. |
[production] |
09:49 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json |
[production] |
09:33 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json |
[production] |
09:26 |
<brouberol@cumin1002> |
START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker |
[production] |
09:18 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json |
[production] |
09:16 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json |
[production] |
09:16 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance |
[production] |
09:16 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance |
[production] |
09:16 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance |
[production] |
09:15 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json |
[production] |
09:07 |
<jayme@deploy1003> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
08:59 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json |
[production] |
08:47 |
<oblivian@cumin2002> |
END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002 |
[production] |
08:46 |
<oblivian@cumin2002> |
START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002 |
[production] |
08:44 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69903 and previous config saved to /var/cache/conftool/dbconfig/20241015-084448-arnaudb.json |
[production] |
08:29 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69902 and previous config saved to /var/cache/conftool/dbconfig/20241015-082941-arnaudb.json |
[production] |
08:27 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance |
[production] |