2101-2150 of 10000 results (108ms)
2024-08-15 ยง
13:25 <klausman@deploy1003> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
13:23 <klausman@deploy1003> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
13:22 <klausman@deploy1003> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
13:15 <logmsgbot> lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1062996|Save the request before starting the automatic vanish job (T372006)]] [production]
12:52 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2009.codfw.wmnet with OS bullseye [production]
12:49 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2010.codfw.wmnet with OS bullseye [production]
12:34 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage [production]
12:32 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage [production]
12:29 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage [production]
12:28 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage [production]
12:26 <klausman@deploy1003> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
12:26 <klausman@deploy1003> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
12:25 <klausman@deploy1003> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
12:23 <klausman@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
12:10 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye [production]
12:09 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye [production]
11:42 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67341 and previous config saved to /var/cache/conftool/dbconfig/20240815-114213-root.json [production]
11:27 <hnowlan@deploy1003> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
11:27 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67340 and previous config saved to /var/cache/conftool/dbconfig/20240815-112707-root.json [production]
11:24 <hnowlan@deploy1003> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
11:12 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67339 and previous config saved to /var/cache/conftool/dbconfig/20240815-111201-root.json [production]
11:04 <hnowlan@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
11:00 <hnowlan@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
10:56 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67338 and previous config saved to /var/cache/conftool/dbconfig/20240815-105656-root.json [production]
10:41 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67337 and previous config saved to /var/cache/conftool/dbconfig/20240815-104150-root.json [production]
10:36 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2006.codfw.wmnet with OS bullseye [production]
10:29 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19 [production]
10:28 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19 [production]
10:28 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19 [production]
10:28 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19 [production]
10:27 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19 [production]
10:27 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19 [production]
10:27 <marostegui> Install 10.6.19 on pc1014 db1125 pc2014 T372536 [production]
10:26 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67336 and previous config saved to /var/cache/conftool/dbconfig/20240815-102645-root.json [production]
10:21 <klausman@deploy1003> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
10:19 <klausman@deploy1003> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
10:18 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage [production]
10:15 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage [production]
10:11 <marostegui@cumin1002> dbctl commit (dc=all): 'db1238 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67335 and previous config saved to /var/cache/conftool/dbconfig/20240815-101139-root.json [production]
09:55 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye [production]
09:27 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change [production]
09:27 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change [production]
09:25 <marostegui@cumin1002> dbctl commit (dc=all): 'Depooling db2152 (T367856)', diff saved to https://phabricator.wikimedia.org/P67334 and previous config saved to /var/cache/conftool/dbconfig/20240815-092502-marostegui.json [production]
09:24 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance [production]
09:24 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance [production]
08:55 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye [production]
08:04 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye [production]
08:00 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye [production]
07:47 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye [production]
07:31 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 10:00:00 on 9 hosts with reason: T364368 non-prod hosts [production]