production SAL

8451-8500 of 10000 results (29ms)

2019-03-06 §
17:10	<elukey@deploy1001>	Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s)	[production]
17:10	<elukey@deploy1001>	Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host	[production]
07:09	<elukey>	raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231	[production]
2019-03-04 §
14:20	<elukey>	update puppet compiler's facts	[production]
2019-03-03 §
10:44	<elukey>	restart pdfrender on scb1003	[production]
2019-02-28 §
16:39	<elukey>	clean up old/stale zookeeper znodes from conf100[4-6] - T216979	[production]
13:56	<elukey>	re-start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952	[production]
11:28	<elukey>	pause cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952	[production]
09:30	<elukey>	start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952	[production]
08:31	<elukey>	roll restart of Yarn Resource Managers on an-master100[1,2] to pick up new settings	[production]
2019-02-27 §
17:56	<elukey>	roll restart hadoop hdfs namenodes on an-master100[1,2] to pick up the new rack config of analytics1071	[production]
17:22	<elukey>	drain + shutdown of analytics1071 to allow its move to A5 - T212348	[production]
2019-02-26 §
07:54	<elukey>	removed /rmstore-analytics-test-hadoop from zookeeper main-eqiad - T216952	[production]
2019-02-24 §
18:20	<elukey>	clean up 2017/2018 log files in /var/log/jmxtrans on kafka1013-22 - root partitions filling up	[production]
18:15	<elukey>	clean up 2017/2018 log files in /var/log/jmxtrans - root partition almost filled up	[production]
10:22	<elukey>	force remount of /mnt/hdfs on an-coord1001 (fuse-hdfs stuck)	[production]
2019-02-22 §
07:28	<elukey>	manually delete WANCache:v:metawiki:translate-groups from memcache on mc1022 to test fix for T203786	[production]
2019-02-13 §
16:30	<elukey>	reimage stat1005 to Debian Buster (again)	[production]
14:25	<elukey>	reimage stat1005 back to stretch to test GPU drivers	[production]
2019-02-12 §
07:26	<elukey>	update analytics-in4 term mysql-dbstore on cr1/cr2 eqiad	[production]
2019-02-08 §
13:37	<elukey>	roll restart of aqs on aqs1* to pick up new druid backend changes	[production]
12:44	<elukey@deploy1001>	Synchronized wmf-config/db-eqiad.php: depooling db1114, host down (duration: 00m 47s)	[production]
2019-02-06 §
14:29	<elukey>	add term mysql-dbstore to analytics-in4/6 on cr1/2-eqiad to allow tcp connections to dbstore100[3-5] - T210478	[production]
2019-02-03 §
20:25	<elukey>	powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel	[production]
18:56	<elukey>	started a tmux session on dbstore1002 to migrate all the tokudb tables of mediawikiwiki to InnoDB - (s3 replication broken)	[production]
17:53	<elukey>	start all slaves on dbstore1002 (After a crash + recovery) + moved mediawikiwiki.revision_actor_temp to Innodb to unblock s3 slave replication (still broken though)	[production]
01:10	<elukey>	powercycle mw1299 - can't ssh nor get a tty via console - racadm getsel shows "An OEM diagnostic event occurred."	[production]
2019-01-25 §
07:51	<elukey>	restart yarn/hdfs daemons on analytics1056 to pick up new disk settings - T214057	[production]
07:40	<elukey>	drain + reboot analytics1054 after disk swap (verify reboot + restore correct fstab mountpoints) - T213038	[production]
2019-01-21 §
10:51	<elukey>	disable puppet fleetwide to ease the merge/deploy of a puppet admin module change - T212949	[production]
2019-01-19 §
12:08	<elukey>	run 'start all slaves' on dbstore1002 after crash	[production]
07:36	<elukey>	restart pdfrender on scb1004	[production]
2019-01-17 §
17:52	<elukey>	re-enable eventlogging mysql clients and db1108's el replication after db1107 maintenance	[production]
12:08	<elukey>	stop mariadb and shutdown db1107 to ease rack a3 maintenance	[production]
11:09	<elukey>	stop eventlogging on eventlog1002 and eventlogging replication on db1108 as prep step for db1107 maintenance	[production]
2019-01-16 §
10:19	<elukey>	executed kafka preferred-replica-election on the logging Kafka cluster as attempt to spread load more uniformly	[production]
08:19	<elukey>	convert aria tables to innodb on dbstore1002 - T213706	[production]
08:11	<elukey>	drop unneeded tables from the staging db on dbstore1002 according to T212493#4883535	[production]
2019-01-15 §
13:00	<elukey>	restart memcached on mc1024 to pick up new settings (-R 200) - T208844	[production]
11:01	<elukey>	run 'apt-get purge tmpreaper' on mw1297,1298,2150,2151,2244,2245 (all role spare) to avoid daily cronspam	[production]
2019-01-14 §
07:48	<elukey>	executed bmc-device --debug --cold-reset on dbstore1002 - "No more sessions available" for mgmt	[production]
2019-01-09 §
14:39	<elukey>	restart Hadoop HDFS namenodes on an-master100[1,2] to complete decom of analytics1028->41	[production]
2019-01-08 §
17:24	<elukey>	roll restart of aqs on aqs100* to pick up new Druid settings	[production]
2019-01-07 §
16:03	<elukey>	stop eventlogging mysql consumers on eventlog1002 and eventlogging replication on db1108 due to issues with db1107	[production]
07:24	<elukey>	restart pdfrender on scb1002	[production]
2019-01-05 §
20:23	<elukey>	manually clean up of big logs under /var/log/..	[production]
2019-01-04 §
08:16	<elukey>	restart eventlogging daemons on eventlog1002 to pick up openssl updates	[production]
2019-01-03 §
09:51	<elukey>	restart memcached on mc1023 to apply -R 200 - T208844	[production]
2018-12-30 §
07:17	<elukey>	restart pdfrender on scb1002 (alarms flapping)	[production]
2018-12-29 §
09:21	<elukey>	restart pdfrender on scb1004	[production]