8001-8050 of 10000 results (33ms)
2020-01-08 §
18:07 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
18:04 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
18:03 <elukey@cumin1001> END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) [production]
18:03 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
2020-01-07 §
16:10 <elukey> cr1/cr2-eqiad: set port 443 (was 8190) for term schema in analytics-in4 [production]
2019-12-31 §
10:49 <marostegui> Upgrade db1108 with elukey [production]
10:23 <elukey> execute 'clear bfd session address fe80::5e5e:ab00:d3d:85ce' on cr3-knams - T240659 [production]
2019-12-29 §
10:07 <elukey> powercycle cp3061 - mgmt serial console not showing a working tty, no ssh [production]
10:06 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp3061.esams.wmnet [production]
2019-12-18 §
09:24 <elukey> execute 'megacli -LDSetProp WT -LAll -aAll' on analytics1057 - T239045 [production]
2019-12-16 §
17:14 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
16:45 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
16:42 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
16:12 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
15:41 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
15:15 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
13:50 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:50 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:52 <elukey> shutdown of the Analytics Hadoop cluster to enable Kerberos [production]
12:16 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:15 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
10:57 <elukey> disable puppet on labstore100[6,7] and stop analytics-related systemd timers - prep step for Kerberos [production]
2019-12-13 §
08:10 <elukey> rm /var/log user.log.1 messages.1 daemon.log.1 kafkatee.log.1 syslog.1 on netflow2001 to free space (logs spammed with the same error message over and over) [production]
08:07 <elukey> restart kafkatee-webrequest.service on netflow1001 (spamming logs about not being able to bind to address:port) [production]
08:07 <elukey> restart fastmon on netflow2001 as attempt to stop spamming logs (failed) [production]
08:06 <elukey> restart kafkatee-webrequest.service on netflow2001 (spamming logs about not being able to bind to address:port) [production]
07:55 <elukey> execute clear bfd session address fe80::ee38:7300:17e8:a04e on cr3-knams to restore BFD session with eqdfw (OSPF3 status ok on cr3-knams) [production]
2019-12-12 §
14:18 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [production]
12:58 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
2019-12-11 §
08:04 <elukey> powercycle cp3055 - down since hours ago, no ssh, no mgmt serial console usable [production]
08:02 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp3055.esams.wmnet [production]
2019-12-09 §
15:09 <elukey> upload prometheus-memcached-exporter 0.6.0+git20191209.bac8a8c-1 to buster-wikimedia [production]
08:33 <elukey> powercycle mw1280, mgmt console stuck, dimm errors in getsel [production]
2019-12-07 §
13:29 <elukey> restart php-fpm on mw1293 (jobrunner) as test [production]
13:26 <elukey> restart php-fpm on mw1299 (jobrunner) as test [production]
2019-12-05 §
16:33 <elukey> execute clear bfd session address fe80::5e5e:ab00:d3d:85ce on cr3-knams [production]
16:32 <elukey> execute clear bfd session address fe80::7a4f:9b00:d4e:8004 on cr1-eqiad [production]
16:20 <elukey> execute clear bfd session address 208.80.154.208 on cr2-eqord [production]
16:20 <elukey> elukey@cr2-eqord> clear bfd session 208.80.154.208 [production]
08:03 <elukey> remove logstash_cleanup_indices_apifeatureusage-search.svc.codfw.wmnet and logstash_cleanup_indices_apifeatureusage-search.svc.eqiad.wmnet from logstash1025,logstash1024,logstash1023,logstash2024,logstash2025 to reduce cronspam - T234854 [production]
2019-12-03 §
15:20 <elukey> executing sudo cumin -b6 -s 20 -p 95 'A:mw-api-eqiad' 'restart-php7.2-fpm' on cumin1001 [production]
08:48 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
08:45 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
08:45 <elukey@cumin1001> END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) [production]
08:45 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
2019-12-02 §
15:38 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
15:35 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
2019-11-29 §
10:47 <elukey@deploy1001> Finished deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts (duration: 00m 08s) [production]
10:47 <elukey@deploy1001> Started deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts [production]
2019-11-28 §
06:56 <elukey> remove log files on an-tool1007 to free root partition space [production]