2021-12-07
ยง
|
15:33 |
<sukhe@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 10 hosts with reason: debugging bird/anycast-hc issues |
[production] |
15:33 |
<sukhe@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: debugging bird/anycast-hc issues |
[production] |
15:25 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster |
[production] |
15:21 |
<sukhe@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: debugging bird/anycast-hc issues |
[production] |
15:21 |
<sukhe@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: debugging bird/anycast-hc issues |
[production] |
15:14 |
<sukhe> |
running authdns-update for Gerrit:744094 |
[production] |
15:09 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster |
[production] |
14:38 |
<jbond> |
renable puppet fleet wide post monitoring refactor 744787 |
[production] |
14:28 |
<godog> |
reboot graphite1004 - T297180 |
[production] |
14:15 |
<Amir1> |
fixing heartbeat grants for wikiuser across the cluster (T296537) |
[production] |
14:11 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti[2013-2014].codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
14:11 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti[2013-2014].codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
14:07 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage |
[production] |
14:07 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage |
[production] |
13:52 |
<Amir1> |
removing wikiuser@localhost on s6 (T296537) |
[production] |
13:45 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2026.codfw.wmnet with OS buster |
[production] |
13:42 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage |
[production] |
13:42 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage |
[production] |
13:40 |
<godog> |
reboot graphite2003 - T297180 |
[production] |
13:39 |
<jbond> |
disable puppet fleet wide to rollout 744787 |
[production] |
13:26 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster |
[production] |
13:16 |
<jelto> |
update GitLab to 14.4.4-ce.0 |
[production] |
13:07 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2014.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
13:07 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2014.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
12:46 |
<Lucas_WMDE> |
UTC morning backport+config window done |
[production] |
12:46 |
<Lucas_WMDE> |
deployed [[gerrit:744071|Update termbox to 2021-12-06-171243-production (T297006)]] |
[production] |
12:44 |
<lucaswerkmeister-wmde@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' . |
[production] |
12:42 |
<lucaswerkmeister-wmde@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' . |
[production] |
12:39 |
<lucaswerkmeister-wmde@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' . |
[production] |
12:39 |
<lucaswerkmeister-wmde@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . |
[production] |
12:24 |
<jbond> |
merge refactor of monitoring classes 725045 |
[production] |
12:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18071 and previous config saved to /var/cache/conftool/dbconfig/20211207-121655-marostegui.json |
[production] |
12:10 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:09 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:744043|Enable reply tool by default on mediawikiwiki (T296444)]] (duration: 00m 57s) |
[production] |
12:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18070 and previous config saved to /var/cache/conftool/dbconfig/20211207-120150-marostegui.json |
[production] |
11:51 |
<moritzm> |
draining primary/secondary instances off ganeti2014 T296622 |
[production] |
11:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18069 and previous config saved to /var/cache/conftool/dbconfig/20211207-114645-marostegui.json |
[production] |
11:38 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1028.eqiad.wmnet |
[production] |
11:32 |
<cmooney@cumin1001> |
START - Cookbook sre.hosts.dhcp for host cloudvirt1028.eqiad.wmnet |
[production] |
11:31 |
<topranks> |
removing IP addressing on cloudvirt1028 manually and forcing DHCP to debug reimage failure (T296906) |
[production] |
11:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18068 and previous config saved to /var/cache/conftool/dbconfig/20211207-113140-marostegui.json |
[production] |
11:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18067 and previous config saved to /var/cache/conftool/dbconfig/20211207-113005-marostegui.json |
[production] |
11:30 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance T277354 |
[production] |
11:29 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance T277354 |
[production] |
11:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T277354)', diff saved to https://phabricator.wikimedia.org/P18066 and previous config saved to /var/cache/conftool/dbconfig/20211207-112707-marostegui.json |
[production] |
11:26 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage |
[production] |
11:26 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage |
[production] |
11:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18065 and previous config saved to /var/cache/conftool/dbconfig/20211207-111203-marostegui.json |
[production] |
11:11 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet |
[production] |