101-150 of 10000 results (19ms)
2026-02-24 ยง
12:37 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
12:37 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
12:37 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
12:37 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
12:36 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [production]
12:36 <aikochou@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
12:36 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . [production]
12:35 <btullis> failing over HDFS namenode services to an-master1004 for T414948 [analytics]
12:35 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
12:35 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
12:35 <aikochou@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
12:33 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
12:32 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' . [production]
12:32 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
12:30 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [production]
12:30 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
12:30 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . [production]
12:29 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' . [production]
12:29 <dpogorzelski@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' . [production]
12:23 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-staging-codfw: maintenance [production]
12:23 <dpogorzelski@cumin1003> START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-staging-codfw: maintenance [production]
12:05 <slyngshede@cumin1003> START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie [production]
12:05 <slyngshede@cumin1003> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie [production]
11:52 <slyngshede@cumin1003> START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie [production]
11:52 <slyngshede@cumin1003> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie [production]
11:48 <btullis@cumin1003> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet [production]
11:42 <marostegui@cumin1003> dbctl commit (dc=all): 'Depooling db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89008 and previous config saved to /var/cache/conftool/dbconfig/20260224-114242-marostegui.json [production]
11:42 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1260.eqiad.wmnet with reason: Maintenance [production]
11:42 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89007 and previous config saved to /var/cache/conftool/dbconfig/20260224-114217-marostegui.json [production]
11:40 <btullis@cumin1003> START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet [production]
11:38 <fceratto@cumin1003> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet [production]
11:38 <fceratto@cumin1003> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
11:36 <mvernon@cumin2002> START - Cookbook sre.dns.netbox [production]
11:29 <mvernon@cumin2002> START - Cookbook sre.hosts.decommission for hosts moss-fe[2001-2002].codfw.wmnet [production]
11:27 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89006 and previous config saved to /var/cache/conftool/dbconfig/20260224-112708-marostegui.json [production]
11:21 <Emperor> depool moss-fe200{1,2} prep for decommissioning T416387 [production]
11:14 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance [production]
11:14 <dpogorzelski@cumin1003> START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance [production]
11:14 <mvernon@cumin2002> conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2005.codfw.wmnet [production]
11:13 <mvernon@cumin2002> conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2004.codfw.wmnet [production]
11:13 <mvernon@cumin2002> conftool action : set/weight=40; selector: service=apus,name=apus-fe2005.codfw.wmnet [production]
11:13 <slyngshede@cumin1003> START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie [production]
11:12 <mvernon@cumin2002> conftool action : set/weight=40; selector: service=apus,name=apus-fe2004.codfw.wmnet [production]
11:12 <marostegui@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89005 and previous config saved to /var/cache/conftool/dbconfig/20260224-111159-marostegui.json [production]
11:00 <dpogorzelski@cumin1003> conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw [production]
11:00 <dpogorzelski@cumin1003> conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad [production]
11:00 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance [production]
10:59 <dpogorzelski@cumin1003> START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance [production]
10:58 <dpogorzelski@cumin1003> END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade [production]
10:57 <dpogorzelski@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]