5451-5500 of 10000 results (85ms)
2023-07-10 §
06:55 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet [production]
06:45 <jmm@cumin2002> START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet [production]
06:43 <godog> add 100G to prometheus/k8s in codfw [production]
01:06 <rzl@deploy1002> helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply [production]
01:06 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
2023-07-09 §
14:51 <apergos> swapped dumpsdata1003 in as the new nfs share for misc dumps; dumpsdata1002 is now a spare, to be decommissioned. 1003 is running bullseye. [production]
04:04 <apergos> rsync misc dumps output files from dumpsdata1002 to 1003, in ariel screen session on 1003, bwlimit to 1G [production]
2023-07-08 §
03:21 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
2023-07-07 §
22:55 <rzl@deploy1002> helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply [production]
22:55 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
22:41 <rzl@deploy1002> helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply [production]
22:21 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
22:04 <jhancock@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye [production]
21:59 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
21:24 <bking@deploy1002> Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 57s) [production]
21:23 <bking@deploy1002> Started deploy [wdqs/wdqs@dff41b7]: 0.3.124 [production]
21:23 <bking@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
20:53 <dwisehaupt@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
20:53 <dwisehaupt@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001" [production]
20:52 <dwisehaupt@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001" [production]
20:50 <dwisehaupt@cumin1001> START - Cookbook sre.dns.netbox [production]
20:44 <jhancock@cumin2002> START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye [production]
19:33 <bking@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
19:33 <bking@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [production]
19:32 <bking@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
19:12 <bking@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) [production]
18:11 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
18:08 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
17:58 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
17:57 <pt1979@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye [production]
17:56 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
17:40 <bking@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
16:44 <pt1979@cumin1001> START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye [production]
16:38 <bking@cumin1001> conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet [production]
16:23 <bking@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
16:20 <hashar> Restarting CI Jenkins due to a confusion in the next build number leading to intermittent 404 when browsing console links | T341348 [production]
16:00 <bking@cumin1001> conftool action : set/pooled=no; selector: name=wdqs2020.codfw.wmnet [production]
15:53 <aborrero@cumin1001> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:51 <bking@cumin1001> conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet [production]
15:50 <bking@cumin1001> conftool action : set/weight=10; selector: name=wdqs2020.codfw.wmnet [production]
15:49 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
15:47 <aborrero@cumin1001> START - Cookbook sre.hosts.provision for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:46 <bking@cumin1001> conftool action : set/pooled=yes; selector: service=(wdqs|wdqs-ssl|wdqs-heavy-queries),name=wdqs2020.codfw.wmnet [production]
15:45 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
15:43 <aborrero@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye [production]
15:33 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
15:30 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
15:05 <bking@deploy1002> Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 50s) [production]
15:04 <bking@deploy1002> Started deploy [wdqs/wdqs@dff41b7]: 0.3.124 [production]
14:58 <bking@deploy1002> Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 49s) [production]