|
2026-02-24
ยง
|
| 12:37 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
| 12:37 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 12:37 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
| 12:37 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
| 12:36 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . |
[production] |
| 12:36 |
<aikochou@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
| 12:36 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . |
[production] |
| 12:35 |
<btullis> |
failing over HDFS namenode services to an-master1004 for T414948 |
[analytics] |
| 12:35 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
| 12:35 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
| 12:35 |
<aikochou@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 12:33 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . |
[production] |
| 12:32 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' . |
[production] |
| 12:32 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . |
[production] |
| 12:30 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . |
[production] |
| 12:30 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
| 12:30 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . |
[production] |
| 12:29 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' . |
[production] |
| 12:29 |
<dpogorzelski@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' . |
[production] |
| 12:23 |
<dpogorzelski@cumin1003> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-staging-codfw: maintenance |
[production] |
| 12:23 |
<dpogorzelski@cumin1003> |
START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-staging-codfw: maintenance |
[production] |
| 12:05 |
<slyngshede@cumin1003> |
START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie |
[production] |
| 12:05 |
<slyngshede@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie |
[production] |
| 11:52 |
<slyngshede@cumin1003> |
START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie |
[production] |
| 11:52 |
<slyngshede@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie |
[production] |
| 11:48 |
<btullis@cumin1003> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet |
[production] |
| 11:42 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Depooling db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89008 and previous config saved to /var/cache/conftool/dbconfig/20260224-114242-marostegui.json |
[production] |
| 11:42 |
<marostegui@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1260.eqiad.wmnet with reason: Maintenance |
[production] |
| 11:42 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89007 and previous config saved to /var/cache/conftool/dbconfig/20260224-114217-marostegui.json |
[production] |
| 11:40 |
<btullis@cumin1003> |
START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet |
[production] |
| 11:38 |
<fceratto@cumin1003> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet |
[production] |
| 11:38 |
<fceratto@cumin1003> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
| 11:36 |
<mvernon@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
| 11:29 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts moss-fe[2001-2002].codfw.wmnet |
[production] |
| 11:27 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89006 and previous config saved to /var/cache/conftool/dbconfig/20260224-112708-marostegui.json |
[production] |
| 11:21 |
<Emperor> |
depool moss-fe200{1,2} prep for decommissioning T416387 |
[production] |
| 11:14 |
<dpogorzelski@cumin1003> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance |
[production] |
| 11:14 |
<dpogorzelski@cumin1003> |
START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance |
[production] |
| 11:14 |
<mvernon@cumin2002> |
conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2005.codfw.wmnet |
[production] |
| 11:13 |
<mvernon@cumin2002> |
conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2004.codfw.wmnet |
[production] |
| 11:13 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=apus,name=apus-fe2005.codfw.wmnet |
[production] |
| 11:13 |
<slyngshede@cumin1003> |
START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie |
[production] |
| 11:12 |
<mvernon@cumin2002> |
conftool action : set/weight=40; selector: service=apus,name=apus-fe2004.codfw.wmnet |
[production] |
| 11:12 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89005 and previous config saved to /var/cache/conftool/dbconfig/20260224-111159-marostegui.json |
[production] |
| 11:00 |
<dpogorzelski@cumin1003> |
conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw |
[production] |
| 11:00 |
<dpogorzelski@cumin1003> |
conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad |
[production] |
| 11:00 |
<dpogorzelski@cumin1003> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance |
[production] |
| 10:59 |
<dpogorzelski@cumin1003> |
START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance |
[production] |
| 10:58 |
<dpogorzelski@cumin1003> |
END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade |
[production] |
| 10:57 |
<dpogorzelski@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |