2023-05-05
ยง
|
09:10 |
<marostegui> |
Failover m2-master from dbproxy1013 to dbproxy1015 |
[production] |
09:08 |
<hnowlan@deploy1002> |
Finished deploy [restbase/deploy@8aba801]: deploying to host missing from configs (duration: 01m 22s) |
[production] |
09:06 |
<hnowlan@deploy1002> |
Started deploy [restbase/deploy@8aba801]: deploying to host missing from configs |
[production] |
08:58 |
<XioNoX> |
deploy CR914772 on all hosts running Bird |
[production] |
08:15 |
<godog> |
delete wal and chunks_head from prometheus5002 and prometheus4002 to let prometheus start back up and not crashloop - T309979 |
[production] |
08:07 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm |
[production] |
08:05 |
<hashar@deploy1002> |
Finished deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 (duration: 00m 13s) |
[production] |
08:04 |
<hashar@deploy1002> |
Started deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 |
[production] |
07:32 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance |
[production] |
06:53 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet |
[production] |
06:51 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm |
[production] |
06:50 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow2003.codfw.wmnet |
[production] |
06:50 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002" |
[production] |
06:50 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet |
[production] |
06:49 |
<jmm@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002" |
[production] |
06:48 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet |
[production] |
06:44 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet |
[production] |
06:43 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org |
[production] |
06:39 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136907 |
[production] |
06:39 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow2003.codfw.wmnet on all recursors |
[production] |
06:39 |
<jmm@cumin2002> |
START - Cookbook sre.dns.wipe-cache netflow2003.codfw.wmnet on all recursors |
[production] |
06:39 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
06:39 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002" |
[production] |
06:38 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org |
[production] |
06:38 |
<jmm@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002" |
[production] |
06:37 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.peering with action 'configure' for AS: 136907 |
[production] |
06:35 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
06:35 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host netflow2003.codfw.wmnet |
[production] |
06:32 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org |
[production] |
06:27 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org |
[production] |
06:26 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org |
[production] |
06:23 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org |
[production] |
06:19 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org |
[production] |
06:15 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org |
[production] |
05:22 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835 |
[production] |
05:00 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47748 and previous config saved to /var/cache/conftool/dbconfig/20230505-050007-ladsgroup.json |
[production] |
04:45 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47747 and previous config saved to /var/cache/conftool/dbconfig/20230505-044500-ladsgroup.json |
[production] |
04:29 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47746 and previous config saved to /var/cache/conftool/dbconfig/20230505-042954-ladsgroup.json |
[production] |
04:21 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835 |
[production] |
04:18 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835 |
[production] |
04:17 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835 |
[production] |
04:14 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47745 and previous config saved to /var/cache/conftool/dbconfig/20230505-041448-ladsgroup.json |
[production] |
04:08 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47744 and previous config saved to /var/cache/conftool/dbconfig/20230505-040837-ladsgroup.json |
[production] |
04:08 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance |
[production] |
04:08 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance |
[production] |
04:08 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47743 and previous config saved to /var/cache/conftool/dbconfig/20230505-040812-ladsgroup.json |
[production] |
04:04 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835 |
[production] |