8451-8500 of 10000 results (33ms)
2019-03-06 §
17:10 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s) [production]
17:10 <elukey@deploy1001> Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host [production]
07:09 <elukey> raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231 [production]
2019-03-04 §
14:20 <elukey> update puppet compiler's facts [production]
2019-03-03 §
10:44 <elukey> restart pdfrender on scb1003 [production]
2019-02-28 §
16:39 <elukey> clean up old/stale zookeeper znodes from conf100[4-6] - T216979 [production]
13:56 <elukey> re-start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952 [production]
11:28 <elukey> pause cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952 [production]
09:30 <elukey> start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952 [production]
08:31 <elukey> roll restart of Yarn Resource Managers on an-master100[1,2] to pick up new settings [production]
2019-02-27 §
17:56 <elukey> roll restart hadoop hdfs namenodes on an-master100[1,2] to pick up the new rack config of analytics1071 [production]
17:22 <elukey> drain + shutdown of analytics1071 to allow its move to A5 - T212348 [production]
2019-02-26 §
07:54 <elukey> removed /rmstore-analytics-test-hadoop from zookeeper main-eqiad - T216952 [production]
2019-02-24 §
18:20 <elukey> clean up 2017/2018 log files in /var/log/jmxtrans on kafka1013-22 - root partitions filling up [production]
18:15 <elukey> clean up 2017/2018 log files in /var/log/jmxtrans - root partition almost filled up [production]
10:22 <elukey> force remount of /mnt/hdfs on an-coord1001 (fuse-hdfs stuck) [production]
2019-02-22 §
07:28 <elukey> manually delete WANCache:v:metawiki:translate-groups from memcache on mc1022 to test fix for T203786 [production]
2019-02-13 §
16:30 <elukey> reimage stat1005 to Debian Buster (again) [production]
14:25 <elukey> reimage stat1005 back to stretch to test GPU drivers [production]
2019-02-12 §
07:26 <elukey> update analytics-in4 term mysql-dbstore on cr1/cr2 eqiad [production]
2019-02-08 §
13:37 <elukey> roll restart of aqs on aqs1* to pick up new druid backend changes [production]
12:44 <elukey@deploy1001> Synchronized wmf-config/db-eqiad.php: depooling db1114, host down (duration: 00m 47s) [production]
2019-02-06 §
14:29 <elukey> add term mysql-dbstore to analytics-in4/6 on cr1/2-eqiad to allow tcp connections to dbstore100[3-5] - T210478 [production]
2019-02-03 §
20:25 <elukey> powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel [production]
18:56 <elukey> started a tmux session on dbstore1002 to migrate all the tokudb tables of mediawikiwiki to InnoDB - (s3 replication broken) [production]
17:53 <elukey> start all slaves on dbstore1002 (After a crash + recovery) + moved mediawikiwiki.revision_actor_temp to Innodb to unblock s3 slave replication (still broken though) [production]
01:10 <elukey> powercycle mw1299 - can't ssh nor get a tty via console - racadm getsel shows "An OEM diagnostic event occurred." [production]
2019-01-25 §
07:51 <elukey> restart yarn/hdfs daemons on analytics1056 to pick up new disk settings - T214057 [production]
07:40 <elukey> drain + reboot analytics1054 after disk swap (verify reboot + restore correct fstab mountpoints) - T213038 [production]
2019-01-21 §
10:51 <elukey> disable puppet fleetwide to ease the merge/deploy of a puppet admin module change - T212949 [production]
2019-01-19 §
12:08 <elukey> run 'start all slaves' on dbstore1002 after crash [production]
07:36 <elukey> restart pdfrender on scb1004 [production]
2019-01-17 §
17:52 <elukey> re-enable eventlogging mysql clients and db1108's el replication after db1107 maintenance [production]
12:08 <elukey> stop mariadb and shutdown db1107 to ease rack a3 maintenance [production]
11:09 <elukey> stop eventlogging on eventlog1002 and eventlogging replication on db1108 as prep step for db1107 maintenance [production]
2019-01-16 §
10:19 <elukey> executed kafka preferred-replica-election on the logging Kafka cluster as attempt to spread load more uniformly [production]
08:19 <elukey> convert aria tables to innodb on dbstore1002 - T213706 [production]
08:11 <elukey> drop unneeded tables from the staging db on dbstore1002 according to T212493#4883535 [production]
2019-01-15 §
13:00 <elukey> restart memcached on mc1024 to pick up new settings (-R 200) - T208844 [production]
11:01 <elukey> run 'apt-get purge tmpreaper' on mw1297,1298,2150,2151,2244,2245 (all role spare) to avoid daily cronspam [production]
2019-01-14 §
07:48 <elukey> executed bmc-device --debug --cold-reset on dbstore1002 - "No more sessions available" for mgmt [production]
2019-01-09 §
14:39 <elukey> restart Hadoop HDFS namenodes on an-master100[1,2] to complete decom of analytics1028->41 [production]
2019-01-08 §
17:24 <elukey> roll restart of aqs on aqs100* to pick up new Druid settings [production]
2019-01-07 §
16:03 <elukey> stop eventlogging mysql consumers on eventlog1002 and eventlogging replication on db1108 due to issues with db1107 [production]
07:24 <elukey> restart pdfrender on scb1002 [production]
2019-01-05 §
20:23 <elukey> manually clean up of big logs under /var/log/.. [production]
2019-01-04 §
08:16 <elukey> restart eventlogging daemons on eventlog1002 to pick up openssl updates [production]
2019-01-03 §
09:51 <elukey> restart memcached on mc1023 to apply -R 200 - T208844 [production]
2018-12-30 §
07:17 <elukey> restart pdfrender on scb1002 (alarms flapping) [production]
2018-12-29 §
09:21 <elukey> restart pdfrender on scb1004 [production]