8651-8700 of 10000 results (34ms)
2018-08-29 §
06:03 <elukey> migrate archiva.wikimedia.org to archiva1001 (upgrading archiva to its latest upstream version + Debian Stretch + Java 8) [production]
2018-08-27 §
07:44 <elukey> force remount of /mnt/hdfs on stat1005 (transport not connected errors) [production]
2018-08-24 §
11:58 <elukey> upgrade nodejs packages on aqs* for security upgrade (rolling restart of aqs daemon included) [production]
2018-08-23 §
10:15 <elukey> restart druid-broker on druid100[4-5] due to unresponsiveness (still unclear why) [production]
2018-08-22 §
14:57 <elukey> upload archiva 2.2.3-2 to stretch-wikimedia [production]
14:56 <elukey> update pcc's facts [production]
10:43 <elukey> upload archiva 2.2.3-1 to stretch-wikimedia/main - T192639 [production]
2018-08-01 §
14:52 <elukey> created user 'research' on db1108 (eventlogging slave - only select grant for the log database) [production]
08:38 <elukey> restart hadoop-yarn-nodemanager on analytics10[31-77] to apply the new memory settings [production]
07:59 <elukey> restart hadoop-yarn-nodemanager on analytics10[28-30] to test new memory settings [production]
06:59 <elukey> restart eventlogging on eventlog1002 to pick up new logging settings [production]
06:58 <elukey@deploy1001> Finished deploy [eventlogging/analytics@762ca2b]: Deploy https://gerrit.wikimedia.org/r/#/c/eventlogging/+/449422/ (duration: 00m 07s) [production]
06:58 <elukey@deploy1001> Started deploy [eventlogging/analytics@762ca2b]: Deploy https://gerrit.wikimedia.org/r/#/c/eventlogging/+/449422/ [production]
2018-07-31 §
16:40 <elukey> repool druid1005 after network maintenance [production]
16:25 <elukey> pool eventbus on kafka1002 after network maintenance [production]
16:16 <elukey> precautionary restart of eventbus on kafka1002 after network downtime (DNS name res errors, Kafka broker conn issues, etc..) [production]
2018-07-30 §
08:07 <elukey@deploy1001> Finished deploy [eventlogging/analytics@54d43e4]: Band aid for T200630 (duration: 00m 05s) [production]
08:07 <elukey@deploy1001> Started deploy [eventlogging/analytics@54d43e4]: Band aid for T200630 [production]
2018-07-28 §
17:35 <elukey> restart eventlogging on eventlog1002 after tons of kafka disconnects (still not clear what happened) [production]
2018-07-27 §
14:18 <elukey> execute echo 'https://wikimania.wikimedia.org' | mwscript purgeList.php on mwmain1001 [production]
2018-07-24 §
17:17 <elukey> restart eventstreams on scb2* nodes (hopefully last time before deploying the fix) to avoid mem leaks issues during the EU night [production]
08:21 <elukey> rolling restart of kafka jumbo/main-(eqiad|codfw) clusters to pick up the new max open files limit (infinity -> 128k) [production]
2018-07-23 §
14:53 <elukey> delete empty/not-used/wrongly-created topics in Kafka main-eqiad - T199510 [production]
2018-07-22 §
16:15 <elukey> rolling restart of eventstreams on scb2* nodes to reduce the memory pressure before the weekend (still waiting for a permanent fix) [production]
2018-07-21 §
17:35 <elukey> rolling restart of eventstreams on scb2* nodes to reduce the memory pressure before the weekend (still waiting for a permanent fix) [production]
2018-07-20 §
15:33 <elukey> rolling restart of eventstreams on scb2* nodes to reduce the memory pressure before the weekend (still waiting for a permanent fix) [production]
15:19 <elukey> powercycle ms-be1016 - RAID errors, no ssh available, I/O errors in com2 console [production]
2018-07-19 §
19:04 <elukey> roll restart eventstreams on scb2* hosts to prevent OOM issues over the EU night - T199813 [production]
14:17 <elukey> roll restart kafka on kafka-jumbo* and kafka main-codfw (kafka2*) to pick up new Xmx/Xms settings (1g -> 2g) [production]
13:57 <elukey> restart kafka on kafka-jumbo1001 to raise Xmx/Xms jvm settings (1g -> 2g) [production]
08:08 <elukey> restart eventstreams on scb2* hosts to pick up new Kafka settings (pointing it to main-codfw) - T199813 [production]
2018-07-18 §
12:57 <elukey> restart eventstreams on scb200[5,6] as precautionary after mem consumption too high [production]
12:57 <elukey> restart eventstreams on scb2003 as precautionary after mem consumption too high [production]
12:56 <elukey> restart eventstreams on scb2001 as precautionary after mem consumption too high (still investigating a fix) [production]
12:24 <elukey> manually added queued.max.messages.kbytes: 65535 to eventstreams on scb2002 as test for T199813 [production]
08:31 <elukey> drain + reboot analytics1030 for kernel updates [production]
2018-07-17 §
18:33 <elukey> rolling restart eventstreams on scb2* nodes to avoid OOMs during the EU night [production]
2018-07-13 §
06:50 <elukey> powercycle ms-be1041 after diagnostic tests [production]
06:42 <elukey> unblocked stuck dpkg processes on an107[2,5] that broke puppet [production]
2018-07-12 §
06:26 <elukey> restart rsyslog on wezen - T199406 [production]
2018-07-11 §
21:53 <elukey> restart rsyslog on lithium - in:imtcp stuck in EAGAIN (Resource temporarily unavailable) due to a old socket to tegmen.wikimedia.org [production]
21:47 <elukey> re-enable kafka mirror maker on kafka100[1-3] [production]
21:16 <elukey> starting kafka on kafka100[1-3] after zk cleanup [production]
19:31 <elukey> cleaned up *change-prop.retry.change-prop.retry* in /srv/kafka/data on kafka100[1-3] [production]
18:34 <elukey> restarted topic nuke script for kafka main [production]
18:14 <elukey> start kafka on kafka1002 [production]
18:14 <elukey> stop mirror makers on kafka100[1-3] [production]
17:31 <elukey> restart kafka on kafka1001 (oom registered) [production]
17:12 <elukey> restart kafka on kafka1003 with 2G heap settings [production]
17:10 <elukey> restart kafka on kafka1002 with 2G heap settings [production]