7401-7450 of 10000 results (105ms)
2023-08-23 ยง
19:12 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage [production]
19:09 <pt1979@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage [production]
19:06 <eevans@cumin1001> START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet [production]
18:57 <eevans@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cassandra-dev2001.codfw.wmnet [production]
18:56 <htriedman@deploy1002> Finished deploy [airflow-dags/platform_eng@33de526]: (no justification provided) (duration: 00m 20s) [production]
18:55 <htriedman@deploy1002> Started deploy [airflow-dags/platform_eng@33de526]: (no justification provided) [production]
18:45 <eevans@cumin1001> START - Cookbook sre.hosts.reboot-single for host cassandra-dev2001.codfw.wmnet [production]
18:45 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye [production]
18:38 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2053.codfw.wmnet with OS bullseye [production]
18:19 <dduvall@deploy1002> Synchronized php: group1 wikis to 1.41.0-wmf.23 refs T343725 (duration: 06m 01s) [production]
18:19 <herron> re-enabled icinga meta-monitoring on wikitech-static [production]
18:17 <denisse> alert hosts maintenance finished [production]
18:13 <denisse> making alert1001 the primary alert host [production]
18:09 <denisse> updating DNS to point to alert1001 [production]
18:03 <denisse> failing over from alert2001 to alert1001 [production]
17:51 <denisse@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert1001.wikimedia.org [production]
17:51 <denisse@cumin1001> START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org [production]
17:47 <denisse> make alert2001 the active host [production]
17:31 <denisse> failing over alert1001 to alert2001 [production]
17:24 <brett@cumin2002> START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw and A:cp [production]
17:24 <brett@cumin2002> START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw and A:cp [production]
17:23 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye [production]
17:23 <brett@cumin2002> END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-upload_eqiad and A:cp [production]
17:23 <brett@cumin2002> END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_eqiad and A:cp [production]
17:22 <brett@cumin2002> START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp [production]
17:22 <brett@cumin2002> START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp [production]
17:20 <pt1979@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:20 <pt1979@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002" [production]
17:19 <pt1979@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002" [production]
17:19 <hnowlan@deploy1002> helmfile [staging] DONE helmfile.d/services/geo-analytics: apply [production]
17:19 <hnowlan@deploy1002> helmfile [staging] START helmfile.d/services/geo-analytics: apply [production]
17:17 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
17:10 <pt1979@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2053'] [production]
17:07 <denisse@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org [production]
17:07 <denisse@cumin1001> START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org [production]
17:06 <denisse> reboot alert2001 for a kernel upgrade [production]
17:05 <herron> set icinga downtime on wikitech-static [production]
17:03 <bking@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export [production]
17:03 <bking@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export [production]
17:00 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2053'] [production]
16:56 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED [production]
16:45 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED [production]
16:45 <pt1979@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:43 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
16:43 <pt1979@cumin2002> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
16:43 <pt1979@cumin2002> END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2053 - pt1979@cumin2002" [production]
16:37 <hnowlan@deploy1002> helmfile [staging] DONE helmfile.d/services/geo-analytics: apply [production]
16:35 <bblack> cp3067-81 - rolling restart of varnish frontends (one at a time, 30 minute sleep between, will run for ~7.5h), for experimental cache memory settings from https://gerrit.wikimedia.org/r/c/operations/puppet/+/951949 [production]
16:27 <hnowlan@deploy1002> helmfile [staging] START helmfile.d/services/geo-analytics: apply [production]
16:25 <hnowlan@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]