1-50 of 10000 results (89ms)
2026-02-24 ยง
11:52 <slyngshede@cumin1003> START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie [production]
11:52 <slyngshede@cumin1003> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie [production]
11:48 <btullis@cumin1003> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet [production]
11:42 <marostegui@cumin1003> dbctl commit (dc=all): 'Depooling db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89008 and previous config saved to /var/cache/conftool/dbconfig/20260224-114242-marostegui.json [production]
11:42 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1260.eqiad.wmnet with reason: Maintenance [production]
11:42 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89007 and previous config saved to /var/cache/conftool/dbconfig/20260224-114217-marostegui.json [production]
11:40 <btullis@cumin1003> START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet [production]
11:38 <fceratto@cumin1003> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet [production]
11:38 <fceratto@cumin1003> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
11:36 <mvernon@cumin2002> START - Cookbook sre.dns.netbox [production]
11:29 <mvernon@cumin2002> START - Cookbook sre.hosts.decommission for hosts moss-fe[2001-2002].codfw.wmnet [production]
11:27 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89006 and previous config saved to /var/cache/conftool/dbconfig/20260224-112708-marostegui.json [production]
11:21 <Emperor> depool moss-fe200{1,2} prep for decommissioning T416387 [production]
11:14 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance [production]
11:14 <dpogorzelski@cumin1003> START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance [production]
11:14 <mvernon@cumin2002> conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2005.codfw.wmnet [production]
11:13 <mvernon@cumin2002> conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2004.codfw.wmnet [production]
11:13 <mvernon@cumin2002> conftool action : set/weight=40; selector: service=apus,name=apus-fe2005.codfw.wmnet [production]
11:13 <slyngshede@cumin1003> START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie [production]
11:12 <mvernon@cumin2002> conftool action : set/weight=40; selector: service=apus,name=apus-fe2004.codfw.wmnet [production]
11:12 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89005 and previous config saved to /var/cache/conftool/dbconfig/20260224-111159-marostegui.json [production]
11:00 <dpogorzelski@cumin1003> conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw [production]
11:00 <dpogorzelski@cumin1003> conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad [production]
11:00 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance [production]
10:59 <dpogorzelski@cumin1003> START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance [production]
10:58 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade [production]
10:57 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
10:57 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
10:56 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89003 and previous config saved to /var/cache/conftool/dbconfig/20260224-105651-marostegui.json [production]
10:56 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
10:56 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
10:56 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
10:55 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
10:55 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
10:55 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' . [production]
10:54 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' . [production]
10:54 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
10:54 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
10:54 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' . [production]
10:53 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
10:53 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' . [production]
10:52 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
10:52 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' . [production]
10:51 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [production]
10:51 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' . [production]
10:51 <fceratto@cumin1003> START - Cookbook sre.dns.netbox [production]
10:51 <fceratto@cumin1003> START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet [production]
10:51 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
10:51 <fceratto@cumin1003> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host dborch1003.eqiad.wmnet [production]
10:51 <fceratto@cumin1003> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dborch1003.eqiad.wmnet with OS trixie [production]