251-300 of 10000 results (74ms)
2024-04-11 ยง
13:46 <btullis@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm [production]
13:46 <sukhe@cumin1002> START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS bullseye [production]
13:45 <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=(cdn|ats-be) [production]
13:43 <marostegui@cumin1002> dbctl commit (dc=all): 'db2177 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60407 and previous config saved to /var/cache/conftool/dbconfig/20240411-134341-root.json [production]
13:43 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60406 and previous config saved to /var/cache/conftool/dbconfig/20240411-134312-arnaudb.json [production]
13:41 <jmm@cumin2002> END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad [production]
13:36 <jmm@cumin2002> START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad [production]
13:34 <sukhe@cumin1002> START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS bullseye [production]
13:32 <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet,service=(cdn|ats-be) [production]
13:32 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2160.codfw.wmnet with reason: reboot multiinstance replica [production]
13:32 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on db2160.codfw.wmnet with reason: reboot multiinstance replica [production]
13:32 <btullis@cumin1002> START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm [production]
13:31 <btullis@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm [production]
13:30 <arnaudb@cumin1002> END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2133.codfw.wmnet [production]
13:28 <marostegui@cumin1002> dbctl commit (dc=all): 'db2177 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60405 and previous config saved to /var/cache/conftool/dbconfig/20240411-132834-root.json [production]
13:28 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60404 and previous config saved to /var/cache/conftool/dbconfig/20240411-132807-arnaudb.json [production]
13:27 <vgutierrez@cumin1002> END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5030,5032].eqsin.wmnet} and A:cp [production]
13:26 <arnaudb@cumin1002> START - Cookbook sre.mysql.upgrade for db2133.codfw.wmnet [production]
13:26 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2133,2160].codfw.wmnet with reason: reboot [production]
13:25 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on db[2133,2160].codfw.wmnet with reason: reboot [production]
13:23 <arnaudb@cumin1002> END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2135.codfw.wmnet [production]
13:18 <arnaudb@cumin1002> START - Cookbook sre.mysql.upgrade for db2135.codfw.wmnet [production]
13:17 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2135,2160].codfw.wmnet with reason: reboot [production]
13:17 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on db[2135,2160].codfw.wmnet with reason: reboot [production]
13:16 <arnaudb@cumin1002> END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2134.codfw.wmnet [production]
13:13 <marostegui@cumin1002> dbctl commit (dc=all): 'db2177 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60403 and previous config saved to /var/cache/conftool/dbconfig/20240411-131327-root.json [production]
13:13 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2129 (re)pooling @ 4%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60402 and previous config saved to /var/cache/conftool/dbconfig/20240411-131301-arnaudb.json [production]
13:12 <btullis@cumin1002> START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm [production]
13:12 <arnaudb@cumin1002> START - Cookbook sre.mysql.upgrade for db2134.codfw.wmnet [production]
13:12 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2134,2160].codfw.wmnet with reason: reboot [production]
13:11 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on db[2134,2160].codfw.wmnet with reason: reboot [production]
13:00 <btullis@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm [production]
12:58 <arnaudb@cumin1002> END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2132.codfw.wmnet [production]
12:58 <marostegui@cumin1002> dbctl commit (dc=all): 'db2177 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60401 and previous config saved to /var/cache/conftool/dbconfig/20240411-125821-root.json [production]
12:57 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2129 (re)pooling @ 2%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60400 and previous config saved to /var/cache/conftool/dbconfig/20240411-125755-arnaudb.json [production]
12:54 <akosiaris> lower weight of mw1437 back to 10 from the 30 I had upped it to yesterday. The backlog of videoscaling is apparently now served and CPU usage has reached "normal" levels [production]
12:54 <arnaudb@cumin1002> START - Cookbook sre.mysql.upgrade for db2132.codfw.wmnet [production]
12:54 <jayme@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
12:53 <akosiaris@cumin1002> conftool action : set/weight=10; selector: name=mw1437.*.wmnet,dc=eqiad [production]
12:53 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2132,2160].codfw.wmnet with reason: reboot [production]
12:53 <jayme@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
12:53 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db[2132,2160].codfw.wmnet with reason: reboot [production]
12:52 <jayme@deploy1002> helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
12:52 <jayme@deploy1002> helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
12:51 <jayme@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
12:50 <jayme@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
12:49 <jayme@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
12:49 <jayme@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
12:45 <ayounsi@cumin1002> START - Cookbook sre.dns.netbox [production]
12:24 <btullis@deploy1002> helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply [production]