2101-2150 of 10000 results (30ms)
2024-04-11 §
14:18 <elukey> drain and restart cassandra-b on aqs2007 - didn't pick up the new truststore during the past roll restart - T352647 [production]
14:10 <elukey> move cassandra instances on aqs1010 to PKI TLS certs - T352647 [production]
13:59 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1010.eqiad.wmnet with reason: Upgrade to PKI [production]
13:59 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1010.eqiad.wmnet with reason: Upgrade to PKI [production]
2024-04-10 §
16:19 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/sessionstore: sync [production]
16:19 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/sessionstore: sync [production]
2024-04-09 §
14:18 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Deploy new Truststore - elukey@cumin1002 [production]
12:50 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
12:45 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
12:44 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Deploy new Truststore - elukey@cumin1002 [production]
2024-04-08 §
16:57 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[08-12]*: Deploy new Truststore - elukey@cumin1002 [production]
16:19 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[08-12]*: Deploy new Truststore - elukey@cumin1002 [production]
16:15 <elukey> manually dran + restart cassandra-a on aqs2007 - cookbook failed [production]
16:06 <elukey@cumin1002> END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:aqs-codfw: Deploy new Truststore - elukey@cumin1002 [production]
15:15 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Deploy new Truststore - elukey@cumin1002 [production]
15:10 <elukey> drain and restart cassandra-a on aqs1011 to test the new truststore [production]
12:56 <elukey> nodetool-a drain + restart of cassandra instances on aqs1010 to pick up the new truststore [production]
12:55 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1010.eqiad.wmnet with reason: Replace Java Truststore [production]
12:55 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1010.eqiad.wmnet with reason: Replace Java Truststore [production]
2024-03-27 §
11:44 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet [production]
11:41 <elukey> run `apt-get clean` on registry2004 to free some space on the root partition [production]
11:39 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet [production]
11:16 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet [production]
11:15 <elukey> expand vram for registry200[3,4] from 4G to 6G - T360637 [production]
11:12 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet [production]
11:11 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry2004.codfw.wmnet with reason: Increase tmpfs for nginx [production]
11:11 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on registry2004.codfw.wmnet with reason: Increase tmpfs for nginx [production]
11:11 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry2003.codfw.wmnet with reason: Increase tmpfs for nginx [production]
11:10 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on registry2003.codfw.wmnet with reason: Increase tmpfs for nginx [production]
2024-03-25 §
14:57 <elukey> increase tmpfs for /var/lib/nginx on registry100[3,4] and restart nginx - T360637 [production]
14:52 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: Increase tmpfs for nginx [production]
14:52 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: Increase tmpfs for nginx [production]
14:51 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1003.eqiad.wmnet with reason: Increase tmpfs for nginx [production]
14:51 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on registry1003.eqiad.wmnet with reason: Increase tmpfs for nginx [production]
2024-03-22 §
13:17 <elukey> `elukey@cumin1002:~$ sudo cumin 'stat100[4,5,8,9]*' 'kill `pgrep -u kcv-wikimf`'` to unblock puppet on various stat nodes [production]
2024-03-21 §
16:46 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=registry1004.eqiad.wmnet [production]
16:46 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet [production]
16:44 <elukey> edit /etc/network/interfaces on registry1004 (ens5 => ens13) - T360637 [production]
16:39 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet [production]
16:38 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=registry1004.eqiad.wmnet [production]
16:38 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=registry1003.eqiad.wmnet [production]
16:38 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1003.eqiad.wmnet [production]
16:35 <elukey> edit /etc/network/interfaces on registry1003 (ens5 => ens13) - T360637 [production]
16:27 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM registry1003.eqiad.wmnet [production]
16:25 <elukey> expand vram for registry100[3,4] from 4G to 6G - T360637 [production]
16:25 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=registry1003.eqiad.wmnet [production]
2024-03-15 §
17:25 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [production]
2024-03-13 §
17:00 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [production]
2024-03-05 §
14:36 <elukey@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
14:36 <elukey@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]