5551-5600 of 10000 results (30ms)
2022-03-10 §
09:38 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage [production]
09:22 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye [production]
2022-03-09 §
17:17 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye [production]
17:04 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage [production]
17:01 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage [production]
16:45 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye [production]
15:31 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2007.codfw.wmnet with OS bullseye [production]
15:19 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage [production]
15:16 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage [production]
15:01 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye [production]
07:31 <elukey> manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production [production]
2022-03-08 §
13:48 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet [production]
13:40 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet [production]
13:39 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet [production]
13:31 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet [production]
13:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet [production]
13:17 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet [production]
13:16 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet [production]
13:09 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet [production]
11:31 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet [production]
11:25 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet [production]
11:25 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet [production]
11:18 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet [production]
11:12 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet [production]
11:05 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet [production]
11:02 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet [production]
10:54 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet [production]
10:47 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet [production]
10:39 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet [production]
10:35 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet [production]
10:28 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet [production]
10:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet [production]
10:19 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet [production]
10:19 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet [production]
10:10 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet [production]
2022-03-07 §
08:46 <elukey> `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions [production]
07:15 <elukey> `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service` [production]
07:14 <elukey> kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user) [production]
2022-03-03 §
10:18 <elukey> kubectl cordon kubernetes200[1-4] to avoid scheduling pods on nodes that will be decommed during the next weeks - T302208 [production]
2022-03-01 §
16:11 <elukey@deploy1002> Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s) [production]
15:35 <elukey@deploy1002> Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 [production]
14:52 <elukey> elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node) [production]
09:48 <elukey> elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) [production]
09:25 <elukey> restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka) [production]
09:06 <elukey> restart purged on cp6005 [production]
08:57 <elukey> restart purged on cp6004 [production]
08:25 <elukey> restart purged on cp6003 [production]
07:59 <elukey> restart purged on cp6002 [production]
06:56 <elukey> restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to) [production]
2022-02-28 §
17:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2022.codfw.wmnet with OS bullseye [production]