1251-1300 of 10000 results (111ms)
2024-10-15 ยง
11:07 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance [production]
11:01 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69920 and previous config saved to /var/cache/conftool/dbconfig/20241015-110132-ladsgroup.json [production]
11:01 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [production]
11:01 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [production]
11:01 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance [production]
11:01 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance [production]
10:57 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json [production]
10:53 <tappof> expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196 [production]
10:53 <arnaudb@cumin1002> dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json [production]
10:52 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance [production]
10:52 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance [production]
10:52 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
10:52 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
10:52 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json [production]
10:38 <brouberol@cumin1002> START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes [production]
10:38 <brouberol@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet [production]
10:37 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json [production]
10:34 <brouberol@cumin1002> START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet [production]
10:30 <brouberol@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet [production]
10:26 <brouberol@cumin1002> START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet [production]
10:25 <brouberol@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet [production]
10:22 <brouberol@cumin1002> START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet [production]
10:22 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json [production]
10:21 <brouberol@cumin1002> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. [production]
10:14 <brouberol@cumin1002> START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. [production]
10:11 <brouberol@cumin1002> END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker [production]
10:06 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json [production]
10:04 <arnaudb@cumin1002> dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json [production]
10:04 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance [production]
10:04 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance [production]
10:04 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json [production]
09:57 <brouberol@cumin1002> START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker [production]
09:55 <brouberol@cumin1002> END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker [production]
09:52 <jayme@deploy1003> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
09:49 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json [production]
09:33 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json [production]
09:26 <brouberol@cumin1002> START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker [production]
09:18 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json [production]
09:16 <arnaudb@cumin1002> dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json [production]
09:16 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance [production]
09:16 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance [production]
09:16 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance [production]
09:15 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json [production]
09:07 <jayme@deploy1003> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
08:59 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json [production]