2023-02-09
ยง
|
12:06 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
12:03 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/ext-Babel.php: Move Babel settings from IS.php to ext-Babel.php, part I (T308932) (duration: 07m 06s) |
[production] |
12:02 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
12:02 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
12:02 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
12:02 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
12:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43991 and previous config saved to /var/cache/conftool/dbconfig/20230209-120138-marostegui.json |
[production] |
11:59 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43990 and previous config saved to /var/cache/conftool/dbconfig/20230209-115940-marostegui.json |
[production] |
11:57 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:57 |
<jiji@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:55 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage |
[production] |
11:53 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage |
[production] |
11:52 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:52 |
<jiji@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43989 and previous config saved to /var/cache/conftool/dbconfig/20230209-114632-marostegui.json |
[production] |
11:44 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43988 and previous config saved to /var/cache/conftool/dbconfig/20230209-114434-marostegui.json |
[production] |
11:40 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bullseye |
[production] |
11:34 |
<marostegui> |
Stop mariadb on db1098 (s6 and s7) T329171 |
[production] |
11:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43986 and previous config saved to /var/cache/conftool/dbconfig/20230209-113125-marostegui.json |
[production] |
11:31 |
<eoghan@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1003.eqiad.wmnet with OS bullseye |
[production] |
11:29 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43985 and previous config saved to /var/cache/conftool/dbconfig/20230209-112927-marostegui.json |
[production] |
11:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43984 and previous config saved to /var/cache/conftool/dbconfig/20230209-112748-marostegui.json |
[production] |
11:27 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance |
[production] |
11:27 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance |
[production] |
11:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43983 and previous config saved to /var/cache/conftool/dbconfig/20230209-112727-marostegui.json |
[production] |
11:24 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43982 and previous config saved to /var/cache/conftool/dbconfig/20230209-112359-marostegui.json |
[production] |
11:23 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance |
[production] |
11:23 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance |
[production] |
11:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43981 and previous config saved to /var/cache/conftool/dbconfig/20230209-112338-marostegui.json |
[production] |
11:20 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:20 |
<jiji@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp1001.eqiad.wmnet with OS bullseye |
[production] |
11:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43980 and previous config saved to /var/cache/conftool/dbconfig/20230209-111220-marostegui.json |
[production] |
11:10 |
<eoghan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage |
[production] |
11:08 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS bullseye |
[production] |
11:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43979 and previous config saved to /var/cache/conftool/dbconfig/20230209-110832-marostegui.json |
[production] |
11:07 |
<eoghan@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage |
[production] |
11:02 |
<effie> |
powercycle mc-gp1001 |
[production] |
10:59 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged |
[production] |
10:59 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged |
[production] |
10:58 |
<joal@deploy1002> |
Finished deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor (duration: 00m 13s) |
[production] |
10:58 |
<joal@deploy1002> |
Started deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor |
[production] |
10:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43978 and previous config saved to /var/cache/conftool/dbconfig/20230209-105714-marostegui.json |
[production] |
10:55 |
<eoghan@cumin1001> |
START - Cookbook sre.hosts.reimage for host gitlab-runner1003.eqiad.wmnet with OS bullseye |
[production] |
10:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43977 and previous config saved to /var/cache/conftool/dbconfig/20230209-105325-marostegui.json |
[production] |
10:52 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage |
[production] |
10:50 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage |
[production] |
10:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43976 and previous config saved to /var/cache/conftool/dbconfig/20230209-104218-root.json |
[production] |
10:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43975 and previous config saved to /var/cache/conftool/dbconfig/20230209-104214-root.json |
[production] |
10:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43974 and previous config saved to /var/cache/conftool/dbconfig/20230209-104208-marostegui.json |
[production] |
10:38 |
<moritzm> |
installing containerd security updates |
[production] |