2023-02-13
ยง
|
22:48 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage |
[production] |
22:45 |
<pt1979@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002'] |
[production] |
22:45 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage |
[production] |
22:44 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2002'] |
[production] |
22:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44522 and previous config saved to /var/cache/conftool/dbconfig/20230213-224240-marostegui.json |
[production] |
22:42 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance |
[production] |
22:42 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance |
[production] |
22:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44521 and previous config saved to /var/cache/conftool/dbconfig/20230213-224219-marostegui.json |
[production] |
22:41 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44520 and previous config saved to /var/cache/conftool/dbconfig/20230213-224102-ladsgroup.json |
[production] |
22:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44519 and previous config saved to /var/cache/conftool/dbconfig/20230213-223806-marostegui.json |
[production] |
22:36 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage |
[production] |
22:36 |
<papaul> |
upgrading firmware on mc-gp2002 |
[production] |
22:33 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage |
[production] |
22:29 |
<dancy> |
Disabled beta-scap-sync-world and beta-update-databases-eqiad Jenkins jobs |
[releng] |
22:27 |
<pt1979@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002'] |
[production] |
22:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44518 and previous config saved to /var/cache/conftool/dbconfig/20230213-222713-marostegui.json |
[production] |
22:25 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44517 and previous config saved to /var/cache/conftool/dbconfig/20230213-222556-ladsgroup.json |
[production] |
22:25 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mw2437.codfw.wmnet with OS buster |
[production] |
22:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44516 and previous config saved to /var/cache/conftool/dbconfig/20230213-222300-marostegui.json |
[production] |
22:22 |
<mutante> |
certbot renew --apache fixed cert issue - https://ldapauth-gitldap.wmflabs.org/ does not exist unrelatedly - T329444 |
[devtools] |
22:18 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44515 and previous config saved to /var/cache/conftool/dbconfig/20230213-221840-ladsgroup.json |
[production] |
22:18 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance |
[production] |
22:18 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance |
[production] |
22:18 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance |
[production] |
22:18 |
<mutante> |
install package python3-certbot-apache on gerrit-prod-1001 - T329444 |
[devtools] |
22:18 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance |
[production] |
22:18 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44514 and previous config saved to /var/cache/conftool/dbconfig/20230213-221815-ladsgroup.json |
[production] |
22:13 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mw2436.codfw.wmnet with OS buster |
[production] |
22:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44513 and previous config saved to /var/cache/conftool/dbconfig/20230213-221207-marostegui.json |
[production] |
22:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44512 and previous config saved to /var/cache/conftool/dbconfig/20230213-220753-marostegui.json |
[production] |
22:03 |
<mutante> |
- re-activating disabled puppet on gerrit-prod-1001 (reason given was 'gerrit deploy' but it was about 17 days ago) |
[devtools] |
22:03 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44511 and previous config saved to /var/cache/conftool/dbconfig/20230213-220308-ladsgroup.json |
[production] |
21:58 |
<mutante> |
rebooting instance gerrit-prod-1001 which can't be reached T329444 |
[devtools] |
21:57 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:57 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001" |
[production] |
21:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44510 and previous config saved to /var/cache/conftool/dbconfig/20230213-215701-marostegui.json |
[production] |
21:56 |
<cmooney@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001" |
[production] |
21:53 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
21:51 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet'] |
[production] |
21:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44509 and previous config saved to /var/cache/conftool/dbconfig/20230213-215055-marostegui.json |
[production] |
21:50 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance |
[production] |
21:50 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance |
[production] |
21:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44508 and previous config saved to /var/cache/conftool/dbconfig/20230213-215034-marostegui.json |
[production] |
21:48 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44507 and previous config saved to /var/cache/conftool/dbconfig/20230213-214802-ladsgroup.json |
[production] |
21:46 |
<taavi> |
reboot mwcurator to fix ldap issues |
[mwoffliner] |
21:44 |
<taavi@deploy1002> |
Finished scap: Backport for [[gerrit:888770|Revert "Revert "Enable mediawiki.page_change on group1 wikis""]] (duration: 09m 00s) |
[production] |
21:44 |
<wm-bot> |
<root> Hard restart to resolve LDAP connection issue. |
[tools.guc] |
21:42 |
<cmooney@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet'] |
[production] |
21:42 |
<bd808> |
Container in CrashLoopBackOff, investigating. |
[tools.guc] |
21:40 |
<xcollazo> |
deploying section_topics v0.5.0 on platform_eng Airflow instance |
[analytics] |