4101-4150 of 10000 results (24ms)
2020-05-11 §
07:54 <moritzm> installing squid security updates [production]
07:21 <moritzm> updated buster netboot images to 10.4 (updated to latest point release) [production]
07:08 <_joe_> dropping requests to mc1020 via a firewall rule T251378 [production]
06:04 <elukey> restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system [production]
2020-05-10 §
12:18 <marostegui> Start event scheduler on db1115 after a massive delete - T252324 [production]
11:05 <marostegui> Stop event scheduler on db1115 to perform a massive delete - T252324 [production]
10:27 <dcausse> restarting blazgraph on wdqs1004: T242453 [production]
09:56 <marostegui> Change scaling_governor from powersave to performance on db1115 - T252324 [production]
09:25 <marostegui> Stop MySQL and restart db1115 - T252324 [production]
08:50 <marostegui> Restart mysql on db1115 to change buffer pool size from 20GB to 40GB T252324 ( [production]
08:44 <elukey> Power cycle analytics1052 after eno1 issue [production]
08:01 <marostegui> Disable unused events like %_schema T252324 T231185 [production]
07:11 <marostegui> Restart mysql on db1115 T231185 [production]
07:11 <marostegui> Truncate tendril. processlist_query_log T231185 [production]
2020-05-08 §
21:45 <bstorm_> cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598 [production]
21:45 <bstorm_> cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598 [production]
21:33 <bstorm_> cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598 [production]
21:06 <ottomata> running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203 [production]
19:16 <jhuneidi@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
19:12 <jhuneidi@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
19:07 <jhuneidi@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
18:12 <andrewbogott> reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121 [production]
17:59 <marostegui> Extend /srv by 500G on labsdb1011 T249188 [production]
16:55 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:53 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
16:51 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:48 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:39 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:37 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
16:14 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:12 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:43 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
15:41 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:36 <ottomata> starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203 [production]
15:31 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
15:28 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:27 <ottomata> stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203 [production]
14:50 <otto@deploy1001> Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s) [production]
14:50 <otto@deploy1001> Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only [production]
14:05 <akosiaris> T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348 [production]
13:22 <vgutierrez> rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335 [production]
13:20 <akosiaris> T243106 redo experiment with DROP iptable rules this time around. Use mw1331, mw1348 [production]
13:16 <akosiaris> T243106 undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure. [production]
12:49 <akosiaris> T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348 [production]
12:49 <akosiaris> T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle [production]
11:49 <hnowlan> restarting cassandra on restbase2009 for java updates [production]
11:28 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
11:25 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime [production]
11:08 <akosiaris> repool eqiad eventgate-analytics. Test concluded [production]
11:08 <akosiaris@cumin1001> conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics [production]