1501-1550 of 10000 results (28ms)
2024-08-30 §
08:50 <elukey@deploy1003> helmfile [staging] DONE helmfile.d/services/thumbor: sync [production]
08:50 <elukey@deploy1003> helmfile [staging] START helmfile.d/services/thumbor: sync [production]
2024-08-28 §
15:23 <elukey@deploy1003> helmfile [eqiad] DONE helmfile.d/services/toolhub: sync [production]
15:23 <elukey@deploy1003> helmfile [eqiad] START helmfile.d/services/toolhub: sync [production]
15:22 <elukey@deploy1003> helmfile [codfw] DONE helmfile.d/services/toolhub: sync [production]
15:22 <elukey@deploy1003> helmfile [codfw] START helmfile.d/services/toolhub: sync [production]
13:45 <elukey@deploy1003> helmfile [eqiad] DONE helmfile.d/services/thumbor: sync [production]
13:40 <elukey@deploy1003> helmfile [eqiad] START helmfile.d/services/thumbor: sync [production]
13:36 <elukey@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
13:31 <elukey@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
13:10 <elukey@deploy1003> helmfile [staging] DONE helmfile.d/services/thumbor: sync [production]
13:10 <elukey@deploy1003> helmfile [staging] START helmfile.d/services/thumbor: sync [production]
2024-08-27 §
15:11 <elukey> restart httpd and librenms-syslog.service on netmon1003 for libaom upgrades [production]
15:11 <elukey> restart httpd on crm2001 for libaom upgrades [production]
15:02 <elukey@puppetserver1001> conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet [production]
15:01 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL [production]
14:44 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL [production]
14:41 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again [production]
14:41 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again [production]
14:40 <elukey@puppetserver1001> conftool action : set/pooled=no; selector: name=wikikube-ctrl2003.codfw.wmnet [production]
2024-08-14 §
14:32 <elukey@deploy1003> helmfile [eqiad] DONE helmfile.d/services/thumbor: sync [production]
14:27 <elukey@deploy1003> helmfile [eqiad] START helmfile.d/services/thumbor: sync [production]
14:22 <elukey@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
14:17 <elukey@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
13:55 <elukey@deploy1003> helmfile [staging] DONE helmfile.d/services/thumbor: sync [production]
13:55 <elukey@deploy1003> helmfile [staging] START helmfile.d/services/thumbor: sync [production]
2024-08-13 §
15:38 <elukey@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2006.codfw.wmnet with OS bookworm [production]
14:56 <elukey@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bookworm [production]
13:05 <elukey> `apt-get install python3-conftool python3-conftool-requestctl` on all puppetserver nodes - upgrade to 3.2.2 [production]
09:23 <elukey> manual run of dump_cloud_ip_ranges.service on puppetserver1001 (failed earlier on) [production]
08:52 <elukey> upgrade conftool python packages on puppetserver1001 to 3.2.2 [production]
2024-08-12 §
14:42 <elukey> powercycle ms-be1078 - causing frontend errors in swift-eqiad, network link is down (if down/up didn't work, nothing in the dmesg/syslog) [production]
12:37 <elukey> restart exim4 on list2001 to pick up the new TLS material [production]
12:35 <elukey> restart exim4 on list1004 to pick up the new TLS material [production]
12:11 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Openjdk upgrade - elukey@cumin1002 [production]
2024-08-08 §
16:29 <elukey> debmonitor-client 0.4.0 rolledout to all bullseye nodes [production]
16:07 <elukey> on cumin1002 "sudo cumin -b 20 -p 95 'P{F:lsbdistcodename="bullseye"} and A:codfw' 'run-puppet-agent -q --failed-only'" [production]
09:38 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002 [production]
09:24 <elukey> powercycle ml-serve2004 - host frozen, no ssh access, get sel shows "Multi-bit memory errors detected on a memory device at location(s) DIMM_A2." [production]
08:19 <elukey> restart dump_ip_reputation.service on puppetserver1001 [production]
08:13 <elukey> restart tomcat on idp[1,2]003 to pick up the new openjdk [production]
08:09 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002 [production]
2024-08-07 §
16:01 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002 [production]
14:33 <elukey@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002 [production]
14:01 <elukey> import Jenkins 2.462.1 on bullseye-wikimedia:thirdparty/ci [production]
13:24 <elukey@cumin1002> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. [production]
13:17 <elukey@cumin1002> START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons. [production]
13:15 <elukey@cumin1002> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons. [production]
08:31 <elukey> openjdk-11 upgrades for bullseye rolled out to prod [production]
2024-08-06 §
15:25 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2035.mgmt.codfw.wmnet with reboot policy GRACEFUL [production]