1651-1700 of 10000 results (27ms)
2024-07-30 §
14:33 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL [production]
13:30 <elukey> deprecate the sre-admins posix group fleetwide (replaced by ops-limited) - T360356 [production]
10:08 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
10:02 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
08:11 <elukey@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
08:05 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
08:03 <elukey@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
08:02 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED [production]
2024-07-26 §
13:42 <elukey> move dump_cloud_ip_ranges's write to /srv/private capabilities back to puppetmaster1001 - T368023 [production]
13:19 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye [production]
13:02 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage [production]
12:58 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage [production]
12:42 <elukey@cumin1002> START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye [production]
10:03 <elukey@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye [production]
08:35 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage [production]
08:32 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage [production]
08:16 <elukey@cumin1002> START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye [production]
2024-07-25 §
15:15 <elukey> upgrade spicerack to 8.9.0 on cumin nodes [production]
14:53 <elukey> uploaded spicerack_8.9.0 to apt.wikimedia.org bullseye-wikimedia [production]
10:42 <elukey> upload docker-report 0.0.15 to bullseye-wimedia and upgrade build2001 [production]
09:19 <elukey> move dump_cloud_ip_ranges from puppetmaster1001 to puppetserver1001 - T368023 [production]
2024-07-22 §
16:02 <elukey> remove /srv/kafka/data/eqiad.resource-purge-3 on kafka-main2001 to force a refetch of data from good replicas and circumvent data corruption - T370574 [production]
15:58 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk [production]
15:57 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk [production]
15:49 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk [production]
15:49 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk [production]
10:24 <elukey> kafka preferred-replica-election on kafka-main - T370574 [production]
08:32 <elukey> restart kafka on kafka-main2005 - T370574 [production]
08:31 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt [production]
08:30 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt [production]
08:07 <elukey> restart kafka on kafka-main2001 - T370574 [production]
08:06 <elukey> restart kafka on kafka-main2001 - sre.hosts.downtime [production]
08:06 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt [production]
08:05 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt [production]
2024-07-19 §
09:54 <elukey@cumin1002> END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet [production]
08:05 <elukey@cumin1002> START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet [production]
2024-07-18 §
15:35 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet [production]
15:13 <elukey@cumin1002> START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet [production]
12:25 <elukey> update spicerack to 8.8.0 on cumin1002 [production]
09:44 <elukey> upgrade spicerack to 8.8.0 on cumin2002 - testing the new release [production]
09:26 <elukey> uploaded spicerack_8.8.0 to apt.wikimedia.org bullseye-wikimedia [production]
2024-07-17 §
09:02 <elukey@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [production]
08:57 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet [production]
08:48 <elukey@cumin1002> START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet [production]
08:47 <elukey@puppetserver1001> conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [production]
07:49 <elukey> restart hadoop-mapreduce-historyserver.service on an-master1003 - failed for Java OOM [production]
07:38 <elukey@cumin1002> END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw [production]
07:36 <elukey@cumin1002> START - Cookbook sre.network.tls for network device lsw1-d1-codfw [production]
2024-07-16 §
15:58 <elukey> uploaded spicerack_8.7.0 to apt.wikimedia.org bullseye-wikimedia [production]
09:12 <elukey> update docker-registry to 0.0.14-1 on build2001 [production]