8501-8550 of 10000 results (27ms)
2018-12-22 §
18:45 <elukey> manually clean up of old log files on an-coord1001 (disk space issues) [production]
2018-12-20 §
19:03 <elukey> restart hdfs namenode on an-master1002 with new heap settings (currently standby, 8->12G) [production]
18:30 <elukey> remove hdfs journalnode config+packages from analytics10(28|35) - not used anymore - T209929 [production]
18:29 <elukey> restart hdfs namenode on an-master1001 with new heap settings (currently standby, 8->12G) [production]
16:31 <elukey> remove two journal nodes from the Analytics hadoop cluster - T209929 [production]
14:39 <elukey> add two journal nodes to the Analytics Hadoop cluster - T209929 [production]
08:05 <elukey> roll restart of druid middlemanagers on druid* to pick up new port settings [production]
07:11 <elukey> restart pdfrender on scb1002 [production]
07:10 <elukey> restart rsyslog on lithium - in:imtcp stuck in recvfrom ms-be2047.codfw.wmnet - T199406 [production]
2018-12-19 §
08:53 <elukey> roll restart of cassandra on aqs1005-1009 for opendjdk upgrades [production]
2018-12-18 §
07:57 <elukey> restart cassandra-{a,b} on aqs1004 for openjdk upgrades [production]
2018-12-17 §
09:01 <elukey> stop kafkatee on oxygen and rsync /srv/log data to weblog1001 [production]
2018-12-16 §
09:52 <elukey> mask + reset-failed kafkatee default instance on sulfur (kafkatee-webrequest works fine) [production]
2018-12-15 §
09:22 <elukey> mask + reset-failed kafkatee default instance on weblog1001 [production]
2018-12-14 §
08:50 <elukey> swap oxygen with weblog1001 [production]
08:47 <elukey> disabled kafkatee-webrequest logstash output on oxygen (prep step before weblog1001) [production]
2018-12-13 §
12:59 <elukey> superset on analytics-tool1003 upgraded to 0.28.1 [production]
11:51 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@35841a7]: (no justification provided) (duration: 00m 38s) [production]
11:51 <elukey@deploy1001> Started deploy [analytics/superset/deploy@35841a7]: (no justification provided) [production]
10:00 <elukey> upgrade nodejs on aqs100[5-9] [production]
2018-12-12 §
15:35 <elukey> upload matomo 3.7.0 to stretch-wikimedia, removed 3.5.1 from jessie-wikimedia [production]
2018-12-11 §
12:37 <elukey> updated nodejs nodejs-legacy on aqs1004 (security upgrades) [production]
2018-12-10 §
08:39 <elukey> roll restart of aqs on aqs100* to pick up new Druid backend settings [production]
2018-12-06 §
14:59 <elukey@deploy1001> Finished deploy [analytics/turnilo/deploy@6bd6e2f]: upgrade deps to nodejs 10 (duration: 00m 09s) [production]
14:59 <elukey@deploy1001> Started deploy [analytics/turnilo/deploy@6bd6e2f]: upgrade deps to nodejs 10 [production]
2018-12-05 §
14:54 <elukey> restart HDFS namenode and Yarn resource manager on an-master100[1,2] to update rack topology config - T209929 [production]
09:07 <elukey> matomo read only + upgrade to matomo 3.7.0 on matomo1001 - T209808 [production]
2018-12-04 §
14:04 <elukey> upgrade turnilo on analytics-tools1002 to nodejs-10 - T210705 [production]
11:12 <elukey@deploy1001> Finished deploy [analytics/aqs/deploy@e9a63cc]: Expose offset and underestimate numbers on unique devices - T164201 (duration: 09m 06s) [production]
11:03 <elukey@deploy1001> Started deploy [analytics/aqs/deploy@e9a63cc]: Expose offset and underestimate numbers on unique devices - T164201 [production]
2018-11-29 §
10:17 <elukey> remove zookeeper's crontabs from conf100[1-3] to fix cronspam [production]
2018-11-28 §
17:01 <chasemp> stat1004:~# aptitude install exfat-fuse exfat-utils (elukey fyi) [production]
08:12 <elukey> apply -R 200 to memcached on mc1022 (cache wipe) - T208844 [production]
2018-11-27 §
08:43 <elukey> roll restart of all druid daemons on druid100[1-6] for openjdk-8 upgrades [production]
2018-11-26 §
07:36 <elukey> restart memcached on mc1021 (cache wipe) to add -R 200 - T208844 [production]
2018-11-21 §
17:21 <elukey> manually started systemd-journald.service on scb1001 after OOM [production]
2018-11-19 §
11:20 <elukey> restart memcached on mc1020 to apply -R 200 settings (shard wiped) - T208844 [production]
2018-11-18 §
09:00 <elukey> cleaned up analytics1039 and restarted Yarn [production]
2018-11-16 §
07:32 <elukey> forced reboot + fsck + removal of /var/lib/hadoop/data/l from fstab on analytics1029 [production]
2018-11-15 §
07:08 <elukey> memcached on mc1019 restarted to apply -R 200 - T208844 [production]
2018-11-13 §
09:20 <elukey> rollout new prometheus-mcrouter-exporter to mw* - previous rollout didn't work as expected [production]
07:05 <elukey> powercycle lvs2006 - mgmt/serial console blank, not responsive since hours ago [production]
2018-11-12 §
18:03 <elukey> rolling restart of aqs on aqs* to pick up new druid datasource settings [production]
16:14 <elukey> upgrade prometheus-mcrouter-exporter on all the mw* hosts to the new version [production]
13:12 <elukey> upgrade the Hadoop Analytics cluster to CDH 5.15 (downtime required) [production]
10:32 <elukey> upload mcrouter exporter 0.0.0+git20181106 to stretch-wikimedia [production]
09:57 <elukey> upgraded cdh packages (cdh 5.10 -> 5.15) for thirdparty/cloudera in jessie/stretch-wikimedia [production]
2018-11-08 §
10:26 <elukey> restart memcached on mc2029 (was depooled yesterday for network maintenance) [production]
2018-11-05 §
09:17 <elukey@deploy1001> Finished deploy [analytics/refinery@9d39efa]: fixing stat1004 (duration: 00m 04s) [production]
09:17 <elukey@deploy1001> Started deploy [analytics/refinery@9d39efa]: fixing stat1004 [production]