2024-02-22
ยง
|
16:53 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance |
[production] |
16:53 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57744 and previous config saved to /var/cache/conftool/dbconfig/20240222-165312-arnaudb.json |
[production] |
16:45 |
<dancy@deploy2002> |
Started scap: testing T357402 again |
[production] |
16:43 |
<dancy@deploy2002> |
sync-world aborted: testing T357402 (duration: 14m 57s) |
[production] |
16:42 |
<akosiaris@cumin1002> |
conftool action : set/pooled=inactive; selector: service=parsoid-php,name=kubernetes.* |
[production] |
16:38 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57743 and previous config saved to /var/cache/conftool/dbconfig/20240222-163806-arnaudb.json |
[production] |
16:36 |
<logmsgbot> |
@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
16:36 |
<logmsgbot> |
@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
16:30 |
<fabfur@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=(cdn|ats-be) |
[production] |
16:30 |
<fabfur@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=(cdn|ats-be) |
[production] |
16:28 |
<fabfur@cumin2002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2031-2032].codfw.wmnet |
[production] |
16:28 |
<fabfur@cumin2002> |
START - Cookbook sre.hosts.remove-downtime for cp[2031-2032].codfw.wmnet |
[production] |
16:28 |
<dancy@deploy2002> |
Started scap: testing T357402 |
[production] |
16:26 |
<dancy@deploy2002> |
Installation of scap version "4.66.0" completed for 458 hosts |
[production] |
16:25 |
<dancy@deploy2002> |
Installing scap version "4.66.0" for 458 hosts |
[production] |
16:23 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57742 and previous config saved to /var/cache/conftool/dbconfig/20240222-162300-arnaudb.json |
[production] |
16:22 |
<volans@cumin1002> |
END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox |
[production] |
16:21 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P57741 and previous config saved to /var/cache/conftool/dbconfig/20240222-162151-root.json |
[production] |
16:19 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye |
[production] |
16:16 |
<volans@cumin1002> |
START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox |
[production] |
16:11 |
<mvernon@cumin2002> |
conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw |
[production] |
16:11 |
<Emperor> |
repool codfs-mw T355868 |
[production] |
16:10 |
<Emperor> |
repool thanos-fe2002 T355868 |
[production] |
16:07 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57740 and previous config saved to /var/cache/conftool/dbconfig/20240222-160753-arnaudb.json |
[production] |
16:06 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P57739 and previous config saved to /var/cache/conftool/dbconfig/20240222-160646-root.json |
[production] |
16:05 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Depooling db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57738 and previous config saved to /var/cache/conftool/dbconfig/20240222-160534-arnaudb.json |
[production] |
16:05 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance |
[production] |
16:05 |
<volans@cumin1002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1001.eqiad.wmnet |
[production] |
16:05 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance |
[production] |
16:05 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57737 and previous config saved to /var/cache/conftool/dbconfig/20240222-160512-arnaudb.json |
[production] |
16:04 |
<volans@cumin1002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet |
[production] |
16:00 |
<topranks> |
Commencing network maintenance migrating servers to new switch codfw rack B2 T355868 |
[production] |
15:58 |
<cmooney@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye |
[production] |
15:57 |
<hnowlan> |
depooling mw[1458,1467-1468,1483-1485,1494].eqiad.wmnet in advance of reimaging |
[production] |
15:56 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw |
[production] |
15:55 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw |
[production] |
15:54 |
<mvernon@cumin2002> |
conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw |
[production] |
15:54 |
<Emperor> |
depool codfs-mw T355868 |
[production] |
15:53 |
<Emperor> |
depool thanos-fe2002 T355868 |
[production] |
15:51 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P57736 and previous config saved to /var/cache/conftool/dbconfig/20240222-155141-root.json |
[production] |
15:50 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57735 and previous config saved to /var/cache/conftool/dbconfig/20240222-155005-arnaudb.json |
[production] |
15:48 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2 |
[production] |
15:48 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2 |
[production] |
15:46 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868 |
[production] |
15:46 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868 |
[production] |
15:39 |
<aqu@deploy2002> |
Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster (duration: 00m 16s) |
[production] |
15:39 |
<aqu@deploy2002> |
Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster |
[production] |
15:36 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P57734 and previous config saved to /var/cache/conftool/dbconfig/20240222-153636-root.json |
[production] |
15:35 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57733 and previous config saved to /var/cache/conftool/dbconfig/20240222-153459-arnaudb.json |
[production] |
15:32 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage |
[production] |