7651-7700 of 10000 results (101ms)
2024-04-23 ยง
15:45 <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: T363174 [production]
15:45 <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: T363174 [production]
15:45 <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: T363174 [production]
15:44 <dzahn@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: T363174 [production]
15:44 <dzahn@cumin2002> START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: T363174 [production]
15:41 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61122 and previous config saved to /var/cache/conftool/dbconfig/20240423-154152-arnaudb.json [production]
15:30 <klausman@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
15:30 <klausman@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
15:28 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
15:27 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1023452|logos: revert back the tagline (T363165)]] (duration: 13m 30s) [production]
15:27 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
15:27 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P61121 and previous config saved to /var/cache/conftool/dbconfig/20240423-152725-ladsgroup.json [production]
15:27 <klausman@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
15:26 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61120 and previous config saved to /var/cache/conftool/dbconfig/20240423-152646-arnaudb.json [production]
15:19 <jmm@cumin2002> END (FAIL) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=1) rolling restart_daemons on A:durum [production]
15:19 <moritzm> restarting FPM on phab1004 [production]
15:16 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
15:16 <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1023452|logos: revert back the tagline (T363165)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
15:13 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1023452|logos: revert back the tagline (T363165)]] [production]
15:12 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61119 and previous config saved to /var/cache/conftool/dbconfig/20240423-151240-arnaudb.json [production]
15:12 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1023445|logos: Add the override for 1M variant of fawiki (T363165)]] (duration: 14m 28s) [production]
15:12 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P61118 and previous config saved to /var/cache/conftool/dbconfig/20240423-151216-ladsgroup.json [production]
15:11 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61117 and previous config saved to /var/cache/conftool/dbconfig/20240423-151140-arnaudb.json [production]
15:10 <arnaudb@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2136.codfw.wmnet with OS bookworm [production]
15:08 <jmm@cumin2002> START - Cookbook sre.dns.roll-restart-reboot-durum rolling restart_daemons on A:durum [production]
15:05 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Deploy new TLS Keystore - PKI - elukey@cumin1002 [production]
15:03 <jmm@cumin2002> END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling restart_daemons on A:durum-drmrs [production]
15:01 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
15:01 <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1023445|logos: Add the override for 1M variant of fawiki (T363165)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
14:59 <vgutierrez> repool ncredir6001 [production]
14:58 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1023445|logos: Add the override for 1M variant of fawiki (T363165)]] [production]
14:57 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61116 and previous config saved to /var/cache/conftool/dbconfig/20240423-145734-arnaudb.json [production]
14:57 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1023439|logos: Add fawiki logo for 1,000,000 article (T363165)]] (duration: 17m 38s) [production]
14:57 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P61115 and previous config saved to /var/cache/conftool/dbconfig/20240423-145709-ladsgroup.json [production]
14:56 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61114 and previous config saved to /var/cache/conftool/dbconfig/20240423-145603-arnaudb.json [production]
14:53 <jmm@cumin2002> START - Cookbook sre.dns.roll-restart-reboot-durum rolling restart_daemons on A:durum-drmrs [production]
14:49 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage [production]
14:47 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage [production]
14:47 <jclark@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts parse1002.eqiad.wmnet [production]
14:46 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
14:44 <vgutierrez> depool ncredir6001 [production]
14:44 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org [production]
14:42 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61113 and previous config saved to /var/cache/conftool/dbconfig/20240423-144229-arnaudb.json [production]
14:42 <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1023439|logos: Add fawiki logo for 1,000,000 article (T363165)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
14:42 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P61112 and previous config saved to /var/cache/conftool/dbconfig/20240423-144202-ladsgroup.json [production]
14:40 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61111 and previous config saved to /var/cache/conftool/dbconfig/20240423-144057-arnaudb.json [production]
14:39 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1023439|logos: Add fawiki logo for 1,000,000 article (T363165)]] [production]
14:38 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org [production]
14:35 <jclark@cumin1002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet [production]
14:35 <jclark@cumin1002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['parse1002.eqiad.wmnet'] [production]