2022-03-25
§
|
08:24 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance |
[production] |
08:04 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json |
[production] |
08:04 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
08:04 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
08:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json |
[production] |
08:02 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance |
[production] |
08:02 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance |
[production] |
07:56 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance |
[production] |
07:56 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance |
[production] |
07:56 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance |
[production] |
07:56 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance |
[production] |
07:56 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json |
[production] |
07:48 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json |
[production] |
07:41 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json |
[production] |
07:33 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json |
[production] |
07:26 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json |
[production] |
07:18 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json |
[production] |
07:10 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json |
[production] |
06:41 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json |
[production] |
06:41 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |
06:41 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |
06:31 |
<_joe_> |
deleting a couple zotero pods with excessive number of restarts |
[production] |
06:29 |
<marostegui> |
dbmaint s4@eqiad T300775 |
[production] |
06:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json |
[production] |
05:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json |
[production] |
05:47 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
05:46 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
05:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json |
[production] |
00:39 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster |
[production] |
2022-03-24
§
|
23:57 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster |
[production] |
23:04 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
22:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23050 and previous config saved to /var/cache/conftool/dbconfig/20220324-223031-marostegui.json |
[production] |
22:19 |
<pt1979@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
22:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23049 and previous config saved to /var/cache/conftool/dbconfig/20220324-221526-marostegui.json |
[production] |
22:14 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
22:10 |
<pt1979@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage |
[production] |
22:07 |
<ebernhardson> |
restart wcqs-blazegraph on wcqs2001 to resolve intermittant BlazegraphFreeAllocatorsDecreasingRapidly |
[production] |
22:06 |
<pt1979@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage |
[production] |
22:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23048 and previous config saved to /var/cache/conftool/dbconfig/20220324-220021-marostegui.json |
[production] |
21:54 |
<pt1979@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
21:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23047 and previous config saved to /var/cache/conftool/dbconfig/20220324-214515-marostegui.json |
[production] |
21:42 |
<pt1979@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
21:38 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:33 |
<pt1979@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
21:13 |
<pt1979@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
21:13 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
21:12 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
21:12 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
21:11 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
21:11 |
<inflatador> |
bking@cumin1001 restarting blazegraph on wdqs[1003-1013].eqiad.wmnet for T293862 |
[production] |