751-800 of 10000 results (16ms)
2020-07-03 ยง
19:20 <joal> restart failed webrequest-load job webrequest-load-wf-text-2020-7-3-17 with higher thresholds - error due to burst of requests in ulsfo [analytics]
19:13 <joal> restart mediawiki-history-denormalize oozie job using 0.0.115 refinery-job jar [analytics]
19:05 <joal> kill manual execution of mediawiki-history to save an-coord1001 (too big of a spark-driver) [analytics]
18:53 <joal> restart webrequest-load-wf-text-2020-7-3-17 after hive server failure [analytics]
18:52 <joal> restart data_quality_stats-wf-event.navigationtiming-useragent_entropy-hourly-2020-7-3-15 after have server failure [analytics]
18:51 <joal> restart virtualpageview-hourly-wf-2020-7-3-15 after hive-server failure [analytics]
18:47 <cdanis> โœ”๏ธ cdanis@an-coord1001.eqiad.wmnet ~ ๐Ÿ•’โ˜• sudo systemctl restart hive-server2.service [production]
16:51 <krinkle@deploy1001> Synchronized wmf-config/CommonSettings.php: Ifa929b2ad4 (duration: 00m 57s) [production]
16:41 <joal> Rerun mediawiki-history-check_denormalize-wf-2020-06 after having cleaned up wrong files and restarted a job without deterministic skewed join [analytics]
16:02 <reedy@deploy1001> Synchronized wmf-config/CommonSettings.php: Rename wgRestrictionMethod to wgShellRestrictionMethod (duration: 00m 58s) [production]
15:46 <jayme@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
15:43 <jayme@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
15:43 <jynus@cumin1001> dbctl commit (dc=all): 'Reduce db1118 weight to spread load mode evenly', diff saved to https://phabricator.wikimedia.org/P11730 and previous config saved to /var/cache/conftool/dbconfig/20200703-154337-jynus.json [production]
15:40 <jayme@cumin1001> START - Cookbook sre.ganeti.makevm [production]
15:38 <jayme@cumin1001> START - Cookbook sre.ganeti.makevm [production]
15:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [production]
15:02 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
14:11 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99) [production]
14:11 <_joe_> restarted php-fpm on wtp1033, stuck in sigill [production]
13:59 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
12:51 <arturo> [codfw1dev] galera cluster should be up and running, openstack happy (T256283) [admin]
12:41 <hashar> Restarting Zuul / CI [production]
11:44 <arturo> [codfw1dev] restoring glance database backup from bacula into cloudcontrol2001-dev (T256283) [admin]
11:39 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
11:39 <arturo> [codfw1dev] stopped mysql database in the galera cluster T256283 [admin]
11:36 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
11:36 <arturo> [codfw1dev] dropped glance database in the galera cluster T256283 [admin]
11:32 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
11:29 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
11:29 <moritzm> rebooting urldownloader standby hosts for kernel updates (1002/2002) [production]
10:59 <moritzm> installing json-c security updates on jessie [production]
10:51 <moritzm> installing ruby-json security updates [production]
10:25 <moritzm> installing nss security updates on jessie [production]
10:18 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
10:15 <elukey> notebook1004 renamed to an-scheduler1001 [production]
10:15 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
10:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
10:07 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:06 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
09:04 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
09:00 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
08:58 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
08:58 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
08:56 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
08:55 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
08:51 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
08:47 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
08:43 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
08:43 <moritzm> rebooting netflow* hosts for kernel security update [production]
08:16 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]