9401-9450 of 10000 results (69ms)
2016-10-20 §
17:42 <urandom> T133395, T113805: Starting a primary-range, incremental repair of local_group_wiktionary_T_parsoid_html.data on restbase2001.codfw.wmnet [production]
17:38 <mutante> rebooting kraz - short downtime of irc.wikimedia.org please prepare to reconnect your clients if they dont automatically do it [production]
17:35 <apergos> reboot of last few stragglers for mw* hosts in codfw/eqiad: mw2152 mw2079 mw1239 [production]
17:29 <mutante> rebooting install2001 [production]
17:14 <paladox> gerrit-test using just -XX:+UseG1GC failed but doing -Xmx50m -Xms50m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 worked. [git]
17:07 <yuvipanda> kill bzip2 on tools-bastion, was causing instance to freeze up again [tools.replacer]
17:06 <elukey> created 0000294-161020124223818-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-20-13 (oozie errors) [analytics]
17:00 <apergos> rolling reboot of video scalers in codfw/eqiad: mw1259 mw1260 mw2152 mw2246 [production]
16:57 <paladox> testing –XX:+UseG1GC on jvm on gerrit-test (gerrit) and if successful will hopefulyl roll out to production [git]
16:55 <yuvipanda> killed bzip2 taking 100% CPU on tools-bastion-03 [tools]
16:48 <apergos> rolling reboot of testservers in codfw/eqiad: mw1017 mw1099 mw2017 mw2099 [production]
16:45 <mutante> rebooting install1001 [production]
16:44 <gehel@puppetmaster1001> conftool action : set/pooled=yes; selector: dc=eqiad,cluster=logstash,service=kibana [production]
16:34 <godog> reboot graphite1001 for kernel upgrade [production]
16:30 <apergos> rolling reboots for jobrunners in eqiad: mw1161-1169, mw1299-1306 [production]
16:26 <gehel> deploying new LVS service for kibana - T132458 [production]
16:25 <godog> reboot graphite1003 for kernel upgrade [production]
16:17 <ottomata> restarting eventlogging after rebooting kafka brokers [analytics]
16:08 <moritzm> bounced ntp on mw2089/mw2241 (XFAC state) [production]
15:59 <mutante> short downtime of ganglia web ui [production]
15:59 <mutante> rebooting uranium [production]
15:43 <mforns> restarted EventLogging after throughput drop [analytics]
15:36 <apergos> rolling reboots for jobrunners in codfw: mw2080-2085, mw2153-mw2162, mw2247-2250 [production]
15:14 <apergos> rolling reboot of image scalers for codfw, eqiad: mw2086-2089, mw2148-2151, mw1293-1298 [production]
15:10 <ottomata> restarted statsv on hafnium [production]
15:07 <madhuvishy> Symlinks for /home and /data/project from labstore1003 mount setup. Puppet enabled everywhere (T147657) [maps]
14:55 <moritzm> bounced ntp on mw2196/mw2197 (XFAC state) [production]
14:35 <madhuvishy> Sync to labstore1003 complete (T147657) [maps]
14:34 <moritzm> rebooting rutherfordium for kernel update [production]
14:27 <filippo@puppetmaster1001> conftool action : set/pooled=no; selector: name=prometheus1001.eqiad.wmnet [production]
14:26 <filippo@puppetmaster1001> conftool action : set/pooled=yes; selector: name=prometheus1002.eqiad.wmnet [production]
14:24 <akosiaris> bounce ntpd on bast4001 [production]
14:20 <moritzm> rebooting auth* servers [production]
14:20 <ottomata> starting rolling restart of analytics-eqiad kafka brokers to apply kernel update [analytics]
14:20 <ottomata> starting rolling restart of analytics-eqiad kafka brokers to apply kernel update [production]
14:18 <filippo@puppetmaster1001> conftool action : set/pooled=no; selector: name=prometheus2001.codfw.wmnet [production]
14:18 <filippo@puppetmaster1001> conftool action : set/pooled=yes; selector: name=prometheus2002.codfw.wmnet [production]
14:17 <apergos> rolling reboot of remaining app servers in codfw: mw2221-2245, and in eqiad: mw1261-1275 [production]
14:11 <jmm@puppetmaster1001> conftool action : set/pooled=inactive; selector: mw2098.codfw.wmnet [production]
14:09 <jynus@mira> Synchronized wmf-config/db-eqiad.php: mariadb: move db1053 from s1 to s4 (duration: 02m 06s) [production]
13:38 <moritzm> restarting mx1001 for kernel update [production]
13:22 <moritzm> restarting francium for kernel update [production]
13:15 <godog> rolling reboot of prometheus machines for kernel update [production]
13:14 <moritzm> restarting ms1001 for kernel update [production]
13:13 <elukey> re-enabling oozie and camus after cluster reboot [analytics]
13:10 <elukey> force failover from temporary Hadoop Master node (an1002) to its stanby (an1001) to restore the standard configuration [production]
13:05 <elukey> correction: force failover for Hadoop Master node (an1001) to its stanby (an1002) and rebooting an1001 for kernel upgrades [production]
12:59 <elukey> force failover for Hadoop Master node (an1002) to its stanby (an1002) and rebooting an1001 for kernel upgrades [production]
12:59 <moritzm> ferm on baham (failed to start due to failing DNS resolution in early boot) [production]
12:52 <moritzm> restarting mx2001 for kernel update [production]