51-100 of 10000 results (16ms)
2026-04-29 ยง
09:20 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1229.eqiad.wmnet with reason: Reimage to Trixie [production]
09:19 <marostegui@cumin1003> START - Cookbook sre.mysql.depool depool db2175: Reimage to Trixie [production]
09:19 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2175.codfw.wmnet with reason: Reimage to Trixie [production]
09:15 <fceratto@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P91862 and previous config saved to /var/cache/conftool/dbconfig/20260429-091542-fceratto.json [production]
09:13 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2189: after reimage to trixie [production]
09:10 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1233: after reimage to trixie [production]
09:05 <fceratto@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1174 (T419961)', diff saved to https://phabricator.wikimedia.org/P91857 and previous config saved to /var/cache/conftool/dbconfig/20260429-090534-fceratto.json [production]
09:01 <jmm@cumin2002> START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bookworm [production]
08:56 <fceratto@cumin1003> dbctl commit (dc=all): 'Depooling db1174 (T419961)', diff saved to https://phabricator.wikimedia.org/P91854 and previous config saved to /var/cache/conftool/dbconfig/20260429-085654-fceratto.json [production]
08:56 <fceratto@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance [production]
08:54 <marostegui@cumin1003> START - Cookbook sre.mysql.pool pool db2194: after reimage to trixie [production]
08:51 <dpogorzelski@deploy1003> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
08:51 <fceratto@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [production]
08:48 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS trixie [production]
08:45 <marostegui@cumin1003> START - Cookbook sre.mysql.pool pool db1175: after reimage to trixie [production]
08:42 <ryankemper@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS trixie [production]
08:40 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS trixie [production]
08:38 <urbanecm@deploy1003> mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=mediawikiwiki Wikimedia_Apps/Team/Android/TriviaGame 'Wikimedia Apps/Team/Android/"Which came first?" Game' 'Martin Urbanec (WMF)' '--reason=per [[:phab:T423845]]' # T423845 [production]
08:38 <urbanecm@deploy1003> mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=mediawikiwiki Wikimedia_Apps/Team/Android/TriviaGame 'Wikimedia Apps/Team/Android/"Which came first?" Game' 'Martin Urbanec (WMF)' '--reason=per [[:phab:T423845]]' # T423845 [production]
08:37 <urbanecm@deploy1003> mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=mediawikiwiki Wikimedia_Apps/Team/Android/TriviaGame 'Wikimedia Apps/Team/Android/Which' came 'first? Game' 'Martin Urbanec (WMF)' '--reason=per [[:phab:T423845]]' # T423845 [production]
08:29 <elukey@deploy1003> helmfile [staging] DONE helmfile.d/services/wikifunctions: sync [production]
08:29 <elukey@deploy1003> helmfile [staging] START helmfile.d/services/wikifunctions: sync [production]
08:28 <marostegui@cumin1003> START - Cookbook sre.mysql.pool pool db2189: after reimage to trixie [production]
08:24 <marostegui@cumin1003> START - Cookbook sre.mysql.pool pool db1233: after reimage to trixie [production]
08:24 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage [production]
08:24 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS trixie [production]
08:21 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS trixie [production]
08:21 <marostegui@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage [production]
08:18 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage [production]
08:18 <Emperor> re-enable puppet in apus/codfw for TLS key rollover T424674 (no change, incident took over) [production]
08:16 <Emperor> disable puppet in apus/codfw for TLS key rollover T424674 [production]
08:14 <marostegui@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage [production]
08:09 <dpogorzelski@deploy1003> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
08:08 <a-pizzata@deploy1003> Finished deploy [analytics/refinery@d6a17a0] (thin): Regular analytics weekly train THIN [analytics/refinery@d6a17a0a] (duration: 01m 54s) [production]
08:06 <a-pizzata@deploy1003> Started deploy [analytics/refinery@d6a17a0] (thin): Regular analytics weekly train THIN [analytics/refinery@d6a17a0a] [production]
08:02 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage [production]
07:59 <a-pizzata@deploy1003> Finished deploy [analytics/refinery@d6a17a0]: Regular analytics weekly train [analytics/refinery@d6a17a0a] (duration: 04m 12s) [production]
07:59 <marostegui@cumin1003> START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS trixie [production]
07:59 <marostegui@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage [production]
07:58 <marostegui@cumin1003> START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS trixie [production]
07:57 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2194: Reimage to Trixie [production]
07:57 <marostegui@cumin1003> START - Cookbook sre.mysql.depool depool db2194: Reimage to Trixie [production]
07:57 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2194.codfw.wmnet with reason: Reimage to Trixie [production]
07:56 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2227: after reimage to trixie [production]
07:56 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1175: Reimage to Trixie [production]
07:56 <marostegui@cumin1003> START - Cookbook sre.mysql.depool depool db1175: Reimage to Trixie [production]
07:55 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1175.eqiad.wmnet with reason: Reimage to Trixie [production]
07:55 <a-pizzata@deploy1003> Started deploy [analytics/refinery@d6a17a0]: Regular analytics weekly train [analytics/refinery@d6a17a0a] [production]
07:55 <a-pizzata@deploy1003> Finished deploy [analytics/refinery@d6a17a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d6a17a0a] (duration: 01m 57s) [production]
07:53 <marostegui@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage [production]