651-700 of 10000 results (72ms)
2024-01-10 ยง
19:00 <topranks> disabling OSPF connection from mr1-codfw to codfw core routers T348164 [production]
18:40 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
18:38 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on prometheus2006.codfw.wmnet with reason: memory upgrade [production]
18:37 <filippo@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on prometheus2006.codfw.wmnet with reason: memory upgrade [production]
18:37 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
18:37 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
18:36 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
18:35 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for prometheus2005.codfw.wmnet [production]
18:35 <filippo@cumin1002> START - Cookbook sre.hosts.remove-downtime for prometheus2005.codfw.wmnet [production]
18:24 <sukhe> stop pybal on lvs2013: T352758 [production]
17:59 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: memory upgrade [production]
17:58 <filippo@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: memory upgrade [production]
17:54 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1377.eqiad.wmnet with OS bullseye [production]
17:47 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
17:46 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
17:44 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
17:44 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
17:40 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host lvs2014.codfw.wmnet [production]
17:34 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1377.eqiad.wmnet with reason: host reimage [production]
17:31 <kamila@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1377.eqiad.wmnet with reason: host reimage [production]
17:28 <sukhe@cumin2002> START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet [production]
17:27 <sukhe> enable puppet on lvs2014: T352758 [production]
17:16 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host mw1377.eqiad.wmnet with OS bullseye [production]
17:15 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1378.eqiad.wmnet with OS bullseye [production]
17:14 <cmooney@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:14 <cmooney@cumin1002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns for sandbox1-a-codfw irb.2201 gw - cmooney@cumin1002" [production]
17:14 <cmooney@cumin1002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update reverse dns for sandbox1-a-codfw irb.2201 gw - cmooney@cumin1002" [production]
17:09 <cmooney@cumin1002> START - Cookbook sre.dns.netbox [production]
16:55 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1378.eqiad.wmnet with reason: host reimage [production]
16:52 <kamila@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1378.eqiad.wmnet with reason: host reimage [production]
16:37 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host mw1378.eqiad.wmnet with OS bullseye [production]
16:36 <godog> upgrade prometheus on prometheus2006 - T354399 [production]
16:32 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
16:32 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
16:31 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
16:31 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
16:30 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
16:29 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
16:25 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1379-1383].eqiad.wmnet with reason: testing reboot [production]
16:25 <kamila@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1379-1383].eqiad.wmnet with reason: testing reboot [production]
16:22 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1379.eqiad.wmnet with OS bullseye [production]
16:20 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
16:02 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: host reimage [production]
16:00 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1383.eqiad.wmnet with OS bullseye [production]
15:59 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1381.eqiad.wmnet with OS bullseye [production]
15:57 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1382.eqiad.wmnet with OS bullseye [production]
15:57 <kamila@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: host reimage [production]
15:41 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: logging::opensearch::data [production]
15:41 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host mw1379.eqiad.wmnet with OS bullseye [production]
15:40 <kamila@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1383.eqiad.wmnet with reason: host reimage [production]