2701-2750 of 10000 results (92ms)
2023-03-16 ยง
20:35 <samtar@deploy2002> Finished scap: Backport for [[gerrit:900399|Remove sampling from breadCrumbs schema]] (duration: 08m 18s) [production]
20:28 <samtar@deploy2002> samtar and sharvaniharan: Backport for [[gerrit:900399|Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet [production]
20:26 <samtar@deploy2002> Started scap: Backport for [[gerrit:900399|Remove sampling from breadCrumbs schema]] [production]
20:21 <brennen@deploy2002> Finished scap: Backport for [[gerrit:900427|Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s) [production]
20:14 <brennen@deploy2002> brennen and jforrester: Backport for [[gerrit:900427|Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet [production]
20:12 <brennen@deploy2002> Started scap: Backport for [[gerrit:900427|Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] [production]
19:28 <xcollazo@deploy2002> Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s) [production]
19:27 <xcollazo@deploy2002> Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided) [production]
18:41 <wfan> enable monthlyconvert for cz [production]
18:40 <xcollazo@deploy2002> Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s) [production]
18:40 <xcollazo@deploy2002> Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) [production]
18:38 <mvernon@cumin2002> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet [production]
18:37 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
18:03 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet [production]
18:03 <sukhe@cumin2002> START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet [production]
17:41 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:41 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:40 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:40 <ayounsi@cumin2002> END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary [production]
17:40 <ayounsi@cumin2002> START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary [production]
17:36 <cmjohnson@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:30 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye [production]
17:21 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye [production]
17:05 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates [production]
17:05 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:59 <xcollazo@deploy2002> Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s) [production]
16:58 <xcollazo@deploy2002> Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. [production]
16:56 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet [production]
16:56 <sukhe@cumin2002> START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet [production]
16:47 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:46 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates [production]
16:31 <Emperor> reboot ms-be2067 again to see if the missing drive comes back [production]
16:30 <mvernon@cumin2002> START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet [production]
15:39 <claime> Pooled new mw hosts mw24[20-51].codfw.wmnet - T326363 [production]
15:28 <sukhe> enable puppet on R:class = dnsrecursor to merge CR: 898957 [done] [production]
15:23 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler [production]
15:23 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner [production]
15:19 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver [production]
15:15 <cgoubert@cumin1001> conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver [production]
15:15 <claime> Pooling new mw hosts mw24[20-51].codfw.wmnet - T326363 [production]
15:13 <cgoubert@cumin1001> conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler [production]
15:12 <cgoubert@cumin1001> conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner [production]
15:11 <cgoubert@cumin1001> conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver [production]
15:11 <cgoubert@cumin1001> conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver [production]
15:10 <sukhe> disable puppet on R:class = dnsrecursor to merge CR: 898957 [production]
15:09 <cgoubert@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts [production]
15:09 <cgoubert@cumin1001> START - Cookbook sre.hosts.remove-downtime for 32 hosts [production]
14:50 <cgoubert@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install [production]
14:49 <cgoubert@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install [production]
14:44 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]