production SAL

8501-8550 of 10000 results (42ms)

2018-12-22 §
18:45	<elukey>	manually clean up of old log files on an-coord1001 (disk space issues)	[production]
2018-12-20 §
19:03	<elukey>	restart hdfs namenode on an-master1002 with new heap settings (currently standby, 8->12G)	[production]
18:30	<elukey>	remove hdfs journalnode config+packages from analytics10(28\|35) - not used anymore - T209929	[production]
18:29	<elukey>	restart hdfs namenode on an-master1001 with new heap settings (currently standby, 8->12G)	[production]
16:31	<elukey>	remove two journal nodes from the Analytics hadoop cluster - T209929	[production]
14:39	<elukey>	add two journal nodes to the Analytics Hadoop cluster - T209929	[production]
08:05	<elukey>	roll restart of druid middlemanagers on druid* to pick up new port settings	[production]
07:11	<elukey>	restart pdfrender on scb1002	[production]
07:10	<elukey>	restart rsyslog on lithium - in:imtcp stuck in recvfrom ms-be2047.codfw.wmnet - T199406	[production]
2018-12-19 §
08:53	<elukey>	roll restart of cassandra on aqs1005-1009 for opendjdk upgrades	[production]
2018-12-18 §
07:57	<elukey>	restart cassandra-{a,b} on aqs1004 for openjdk upgrades	[production]
2018-12-17 §
09:01	<elukey>	stop kafkatee on oxygen and rsync /srv/log data to weblog1001	[production]
2018-12-16 §
09:52	<elukey>	mask + reset-failed kafkatee default instance on sulfur (kafkatee-webrequest works fine)	[production]
2018-12-15 §
09:22	<elukey>	mask + reset-failed kafkatee default instance on weblog1001	[production]
2018-12-14 §
08:50	<elukey>	swap oxygen with weblog1001	[production]
08:47	<elukey>	disabled kafkatee-webrequest logstash output on oxygen (prep step before weblog1001)	[production]
2018-12-13 §
12:59	<elukey>	superset on analytics-tool1003 upgraded to 0.28.1	[production]
11:51	<elukey@deploy1001>	Finished deploy [analytics/superset/deploy@35841a7]: (no justification provided) (duration: 00m 38s)	[production]
11:51	<elukey@deploy1001>	Started deploy [analytics/superset/deploy@35841a7]: (no justification provided)	[production]
10:00	<elukey>	upgrade nodejs on aqs100[5-9]	[production]
2018-12-12 §
15:35	<elukey>	upload matomo 3.7.0 to stretch-wikimedia, removed 3.5.1 from jessie-wikimedia	[production]
2018-12-11 §
12:37	<elukey>	updated nodejs nodejs-legacy on aqs1004 (security upgrades)	[production]
2018-12-10 §
08:39	<elukey>	roll restart of aqs on aqs100* to pick up new Druid backend settings	[production]
2018-12-06 §
14:59	<elukey@deploy1001>	Finished deploy [analytics/turnilo/deploy@6bd6e2f]: upgrade deps to nodejs 10 (duration: 00m 09s)	[production]
14:59	<elukey@deploy1001>	Started deploy [analytics/turnilo/deploy@6bd6e2f]: upgrade deps to nodejs 10	[production]
2018-12-05 §
14:54	<elukey>	restart HDFS namenode and Yarn resource manager on an-master100[1,2] to update rack topology config - T209929	[production]
09:07	<elukey>	matomo read only + upgrade to matomo 3.7.0 on matomo1001 - T209808	[production]
2018-12-04 §
14:04	<elukey>	upgrade turnilo on analytics-tools1002 to nodejs-10 - T210705	[production]
11:12	<elukey@deploy1001>	Finished deploy [analytics/aqs/deploy@e9a63cc]: Expose offset and underestimate numbers on unique devices - T164201 (duration: 09m 06s)	[production]
11:03	<elukey@deploy1001>	Started deploy [analytics/aqs/deploy@e9a63cc]: Expose offset and underestimate numbers on unique devices - T164201	[production]
2018-11-29 §
10:17	<elukey>	remove zookeeper's crontabs from conf100[1-3] to fix cronspam	[production]
2018-11-28 §
17:01	<chasemp>	stat1004:~# aptitude install exfat-fuse exfat-utils (elukey fyi)	[production]
08:12	<elukey>	apply -R 200 to memcached on mc1022 (cache wipe) - T208844	[production]
2018-11-27 §
08:43	<elukey>	roll restart of all druid daemons on druid100[1-6] for openjdk-8 upgrades	[production]
2018-11-26 §
07:36	<elukey>	restart memcached on mc1021 (cache wipe) to add -R 200 - T208844	[production]
2018-11-21 §
17:21	<elukey>	manually started systemd-journald.service on scb1001 after OOM	[production]
2018-11-19 §
11:20	<elukey>	restart memcached on mc1020 to apply -R 200 settings (shard wiped) - T208844	[production]
2018-11-18 §
09:00	<elukey>	cleaned up analytics1039 and restarted Yarn	[production]
2018-11-16 §
07:32	<elukey>	forced reboot + fsck + removal of /var/lib/hadoop/data/l from fstab on analytics1029	[production]
2018-11-15 §
07:08	<elukey>	memcached on mc1019 restarted to apply -R 200 - T208844	[production]
2018-11-13 §
09:20	<elukey>	rollout new prometheus-mcrouter-exporter to mw* - previous rollout didn't work as expected	[production]
07:05	<elukey>	powercycle lvs2006 - mgmt/serial console blank, not responsive since hours ago	[production]
2018-11-12 §
18:03	<elukey>	rolling restart of aqs on aqs* to pick up new druid datasource settings	[production]
16:14	<elukey>	upgrade prometheus-mcrouter-exporter on all the mw* hosts to the new version	[production]
13:12	<elukey>	upgrade the Hadoop Analytics cluster to CDH 5.15 (downtime required)	[production]
10:32	<elukey>	upload mcrouter exporter 0.0.0+git20181106 to stretch-wikimedia	[production]
09:57	<elukey>	upgraded cdh packages (cdh 5.10 -> 5.15) for thirdparty/cloudera in jessie/stretch-wikimedia	[production]
2018-11-08 §
10:26	<elukey>	restart memcached on mc2029 (was depooled yesterday for network maintenance)	[production]
2018-11-05 §
09:17	<elukey@deploy1001>	Finished deploy [analytics/refinery@9d39efa]: fixing stat1004 (duration: 00m 04s)	[production]
09:17	<elukey@deploy1001>	Started deploy [analytics/refinery@9d39efa]: fixing stat1004	[production]