6401-6450 of 10000 results (24ms)
2021-06-26 §
13:43 <elukey> depool mw1384 for investigation [production]
13:43 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet [production]
13:33 <elukey> restart phpfpm on mw1353 mw1365 mw1371 [production]
13:30 <elukey> restart php-fpm on mw1351 mw1373 mw1352 mw1349 [production]
13:23 <elukey> restart-phpfpm on mw1350 (0 idle php workers) [production]
13:20 <elukey> restart-phpfpm on mw1333 (0 idle php workers) [production]
10:08 <elukey> restart php-fpm on mw1372 - T285593 [production]
10:07 <elukey> restart php-fpm on mw1372 - T285593 [production]
09:45 <elukey> restart php-fpm on mw135[4-5] [production]
09:44 <elukey> restart php-fpm on mw1354 [production]
09:38 <elukey> reboot mw1414 (not reachable via ssh, nor via mgmt console) [production]
09:33 <elukey> restart php-fpm on mw1367 (php fatal memory errors, php7adm /apcu-frag returns errors) [production]
2021-06-25 §
08:01 <elukey> reboot an-worker1101 to unblock stuck GPU [production]
2021-06-21 §
13:12 <elukey> upload istioctl 1.9.5 to {buster,stretch}-wikimedia [production]
2021-06-17 §
08:28 <elukey> upload istioctl 1.6.14-1 to buster-wikimedia [production]
2021-06-11 §
05:56 <elukey> rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run [production]
05:47 <elukey> run systemctl reset-failed ifup@en5.service on doh1001 - T273026 [production]
2021-06-08 §
17:10 <elukey> fix dbstore1007's ip address in analytics-in4 on cr{1,2}-eqiad [production]
08:19 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
08:13 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
06:27 <elukey> clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first) [production]
2021-05-29 §
14:44 <elukey> execute apt-get clean on an-airflow1001 to free space [production]
14:40 <elukey@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet [production]
2021-05-28 §
08:02 <elukey> restart blazegraph on wdqs1011 [production]
2021-05-26 §
09:13 <elukey> deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/695192 on {cr1|cr2}-eqiad - T225005 [production]
2021-05-20 §
06:08 <elukey> powercycle ms-be2035 - no ssh available, no metrics since hours ago, I/O errors registered in the main tty on serial console [production]
2021-05-17 §
15:26 <elukey@deploy1002> Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 (duration: 19m 48s) [production]
15:06 <elukey@deploy1002> Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 [production]
2021-05-10 §
15:20 <elukey> restart rsyslog on rpki1001 [production]
06:37 <elukey> apt-get clean on rpki1001 to free some space [production]
2021-05-06 §
09:03 <elukey> sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels) [production]
06:20 <elukey> apt-get clean on ping[1,2,3]001 to free some space [production]
2021-05-01 §
07:22 <elukey> powercycle elastic2033 - no ssh, no tty available via mgmt [production]
2021-04-28 §
07:19 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
07:14 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
07:12 <elukey> add AAAA record for kafka-main200[3,4,5].codfw.wmnet [production]
07:10 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
07:05 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
07:04 <elukey> add AAAA record for kafka-main2002.codfw.wmnet [production]
2021-04-27 §
11:36 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) [production]
11:30 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-mirror-maker [production]
06:55 <elukey> upgrade mariadb to 10.4.18-1 + reboot on db1108 - T279281 [production]
06:11 <elukey> powercycle elastic2043 - no ssh, no tty remote console available [production]
2021-04-26 §
15:21 <elukey> restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter [production]
06:24 <elukey> reboot an-coord1001 to pick up kernel security settings (after reimage) [production]
2021-04-23 §
17:02 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet [production]
13:33 <elukey> roll restart of all thanos-swift proxies to pick up new ML account - T280773 [production]
2021-04-21 §
09:08 <elukey> upgrade hue on an-tool1009 to 4.9 [production]
07:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE [production]
07:50 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE [production]