451-500 of 10000 results (70ms)
2023-03-16 §
18:37 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
18:03 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet [production]
18:03 <sukhe@cumin2002> START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet [production]
17:41 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:41 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:40 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:40 <ayounsi@cumin2002> END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary [production]
17:40 <ayounsi@cumin2002> START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary [production]
17:36 <cmjohnson@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:30 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:21 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye [production]
17:05 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:05 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:59 <xcollazo@deploy2002> Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s) [production]
16:58 <xcollazo@deploy2002> Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. [production]
16:56 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet [production]
16:56 <sukhe@cumin2002> START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet [production]
16:47 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:46 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:31 <Emperor> reboot ms-be2067 again to see if the missing drive comes back [production]
16:30 <mvernon@cumin2002> START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet [production]
15:39 <claime> Pooled new mw hosts mw24[20-51].codfw.wmnet - T326363 [production]
15:28 <sukhe> enable puppet on R:class = dnsrecursor to merge CR: 898957 [done] [production]
15:23 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler [production]
15:23 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner [production]
15:19 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver [production]
15:15 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver [production]
15:15 <claime> Pooling new mw hosts mw24[20-51].codfw.wmnet - T326363 [production]
15:13 <cgoubert@cumin1001> conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler [production]
15:12 <cgoubert@cumin1001> conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner [production]
15:11 <cgoubert@cumin1001> conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver [production]
15:11 <cgoubert@cumin1001> conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver [production]
15:10 <sukhe> disable puppet on R:class = dnsrecursor to merge CR: 898957 [production]
15:09 <cgoubert@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts [production]
15:09 <cgoubert@cumin1001> START - Cookbook sre.hosts.remove-downtime for 32 hosts [production]
14:50 <cgoubert@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install [production]
14:49 <cgoubert@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install [production]
14:44 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
14:40 <elukey@deploy2002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
14:40 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
14:40 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
14:40 <elukey@deploy2002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
14:31 <elukey@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
14:31 <elukey@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
14:06 <urandom> ALTER-ing image_suggestions.suggestion table — T328670 [production]
13:35 <kostajh> UTC afternoon deploys done [production]
13:34 <kharlan@deploy2002> Finished scap: Backport for [[gerrit:894593|GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s) [production]
13:28 <kharlan@deploy2002> kharlan: Backport for [[gerrit:894593|GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet [production]
13:27 <kharlan@deploy2002> Started scap: Backport for [[gerrit:894593|GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] [production]
13:15 <kharlan@deploy2002> Finished scap: Backport for [[gerrit:900196|GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s) [production]