2022-11-04
ยง
|
17:01 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=nginx,name=moss-fe2001.codfw.wmnet |
[production] |
17:01 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=swift-fe,name=moss-fe2001.codfw.wmnet |
[production] |
17:00 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=nginx,name=moss-fe1001.eqiad.wmnet |
[production] |
17:00 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=swift-fe,name=moss-fe1001.eqiad.wmnet |
[production] |
16:58 |
<Emperor> |
rolling restart of swift-proxies to bring moss-fe{1,2}001 into service T322424 |
[production] |
16:55 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet |
[production] |
16:53 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet |
[production] |
16:48 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet |
[production] |
16:48 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet |
[production] |
16:41 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet |
[production] |
16:41 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet |
[production] |
16:35 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet |
[production] |
16:35 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp4052'] |
[production] |
16:34 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet |
[production] |
16:34 |
<pt1979@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp4052'] |
[production] |
16:33 |
<pt1979@cumin2002> |
END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp4052'] |
[production] |
16:29 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bullseye |
[production] |
16:26 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bullseye |
[production] |
16:13 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage |
[production] |
16:11 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage |
[production] |
16:10 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage |
[production] |
16:07 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage |
[production] |
16:06 |
<jhathaway@deploy1002> |
helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
16:06 |
<jhathaway@deploy1002> |
helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'. |
[production] |
15:57 |
<Emperor> |
repool ms-fe{1,2}009 |
[production] |
15:55 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye |
[production] |
15:54 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bullseye |
[production] |
15:48 |
<pt1979@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp4052'] |
[production] |
15:43 |
<aikochou@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . |
[production] |
15:41 |
<aikochou@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . |
[production] |
15:00 |
<elukey> |
`elukey@cumin1001:~$ sudo cumin 'ms-fe2*' 'systemctl restart swift-proxy' -b 1 -s 20` |
[production] |
14:52 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
14:52 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
14:52 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318955)', diff saved to https://phabricator.wikimedia.org/P38159 and previous config saved to /var/cache/conftool/dbconfig/20221104-145225-ladsgroup.json |
[production] |
14:52 |
<vgutierrez@puppetmaster1001> |
conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad |
[production] |
14:51 |
<Emperor> |
restart swift-proxy on ms-fe1012 |
[production] |
14:48 |
<elukey> |
restart swift-proxy on ms-fe1011 |
[production] |
14:44 |
<Emperor> |
restart swift-proxy on ms-fe1010 |
[production] |
14:41 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbprov2004.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
14:37 |
<vgutierrez@puppetmaster1001> |
conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad |
[production] |
14:37 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P38158 and previous config saved to /var/cache/conftool/dbconfig/20221104-143718-ladsgroup.json |
[production] |
14:28 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host dbprov2004.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
14:26 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp4052'] |
[production] |
14:25 |
<pt1979@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp4052'] |
[production] |
14:24 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbprov2004.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
14:23 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp4052'] |
[production] |
14:22 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P38157 and previous config saved to /var/cache/conftool/dbconfig/20221104-142212-ladsgroup.json |
[production] |
14:07 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318955)', diff saved to https://phabricator.wikimedia.org/P38156 and previous config saved to /var/cache/conftool/dbconfig/20221104-140705-ladsgroup.json |
[production] |
14:04 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1196 (T318955)', diff saved to https://phabricator.wikimedia.org/P38155 and previous config saved to /var/cache/conftool/dbconfig/20221104-140427-ladsgroup.json |
[production] |
14:04 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance |
[production] |