1-50 of 10000 results (17ms)
2026-05-08 ยง
23:30 <vriley@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
23:30 <vriley@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" [production]
23:30 <vriley@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" [production]
23:26 <vriley@cumin1003> START - Cookbook sre.dns.netbox [production]
23:22 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm [production]
23:22 <vriley@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [production]
23:11 <vriley@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [production]
22:54 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage [production]
22:46 <vriley@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage [production]
22:26 <vriley@cumin1003> START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm [production]
22:16 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
21:56 <vriley@cumin1003> START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
21:55 <vriley@cumin1003> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 [production]
21:53 <vriley@cumin1003> START - Cookbook sre.network.configure-switch-interfaces for host db1266 [production]
21:52 <vriley@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:52 <vriley@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" [production]
21:51 <vriley@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" [production]
21:45 <vriley@cumin1003> START - Cookbook sre.dns.netbox [production]
21:42 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm [production]
21:42 <vriley@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [production]
21:41 <vriley@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [production]
21:24 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage [production]
21:19 <vriley@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage [production]
20:56 <wmbot~jeanfred@tools-bastion-15> Deploy 71d7701 (Add mdformat pre-commit hook for Markdown formatting) [tools.integraality]
20:56 <wmbot~jeanfred@tools-bastion-15> Deploy a10fba1 (Add CONTRIBUTING.md) [tools.integraality]
20:56 <wmbot~jeanfred@tools-bastion-15> Deploy 1c1cec1 (Rewrite README) [tools.integraality]
20:54 <vriley@cumin1003> START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm [production]
20:44 <vriley@cumin1003> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
20:32 <vriley@cumin1003> START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
20:31 <vriley@cumin1003> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 [production]
20:30 <vriley@cumin1003> START - Cookbook sre.network.configure-switch-interfaces for host db1265 [production]
20:29 <vriley@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
20:29 <vriley@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" [production]
20:29 <vriley@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" [production]
20:24 <vriley@cumin1003> START - Cookbook sre.dns.netbox [production]
20:01 <ryankemper> [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ [production]
19:42 <jelto@deploy1003> helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply [production]
19:41 <jelto@deploy1003> helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply [production]
19:41 <jelto@deploy1003> helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply [production]
19:40 <jelto@deploy1003> helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply [production]
18:07 <ryankemper> [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health [production]
17:53 <ryankemper> [WDQS] Deployed 2 new requestctl rules; we'll see if it helps [production]
16:51 <topranks> enable bfd on system0.0 sub-interface ssw1-d1-eqiad [production]
15:45 <jynus@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart [production]
15:37 <jynus@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart [production]
14:53 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet [production]
14:47 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet [production]
14:07 <fceratto@deploy1003> helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . [production]
10:51 <btullis> re-pooled wdqs-main in eqiad for T425758 [production]
10:50 <btullis@cumin1003> conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad [production]