production SAL

4101-4150 of 10000 results (36ms)

2020-05-11 §
07:54	<moritzm>	installing squid security updates	[production]
07:21	<moritzm>	updated buster netboot images to 10.4 (updated to latest point release)	[production]
07:08	<_joe_>	dropping requests to mc1020 via a firewall rule T251378	[production]
06:04	<elukey>	restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system	[production]
2020-05-10 §
12:18	<marostegui>	Start event scheduler on db1115 after a massive delete - T252324	[production]
11:05	<marostegui>	Stop event scheduler on db1115 to perform a massive delete - T252324	[production]
10:27	<dcausse>	restarting blazgraph on wdqs1004: T242453	[production]
09:56	<marostegui>	Change scaling_governor from powersave to performance on db1115 - T252324	[production]
09:25	<marostegui>	Stop MySQL and restart db1115 - T252324	[production]
08:50	<marostegui>	Restart mysql on db1115 to change buffer pool size from 20GB to 40GB T252324 (	[production]
08:44	<elukey>	Power cycle analytics1052 after eno1 issue	[production]
08:01	<marostegui>	Disable unused events like %_schema T252324 T231185	[production]
07:11	<marostegui>	Restart mysql on db1115 T231185	[production]
07:11	<marostegui>	Truncate tendril. processlist_query_log T231185	[production]
2020-05-08 §
21:45	<bstorm_>	cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598	[production]
21:45	<bstorm_>	cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598	[production]
21:33	<bstorm_>	cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598	[production]
21:06	<ottomata>	running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203	[production]
19:16	<jhuneidi@deploy1001>	helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
19:12	<jhuneidi@deploy1001>	helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
19:07	<jhuneidi@deploy1001>	helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .	[production]
18:12	<andrewbogott>	reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121	[production]
17:59	<marostegui>	Extend /srv by 500G on labsdb1011 T249188	[production]
16:55	<pt1979@cumin2001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
16:53	<pt1979@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
16:51	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
16:48	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
16:39	<pt1979@cumin2001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
16:37	<pt1979@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
16:14	<pt1979@cumin2001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
16:12	<pt1979@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
15:43	<pt1979@cumin2001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
15:41	<pt1979@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
15:36	<ottomata>	starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203	[production]
15:31	<pt1979@cumin2001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
15:28	<pt1979@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
15:27	<ottomata>	stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203	[production]
14:50	<otto@deploy1001>	Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)	[production]
14:50	<otto@deploy1001>	Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only	[production]
14:05	<akosiaris>	T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348	[production]
13:22	<vgutierrez>	rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335	[production]
13:20	<akosiaris>	T243106 redo experiment with DROP iptable rules this time around. Use mw1331, mw1348	[production]
13:16	<akosiaris>	T243106 undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.	[production]
12:49	<akosiaris>	T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348	[production]
12:49	<akosiaris>	T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle	[production]
11:49	<hnowlan>	restarting cassandra on restbase2009 for java updates	[production]
11:28	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
11:25	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
11:08	<akosiaris>	repool eqiad eventgate-analytics. Test concluded	[production]
11:08	<akosiaris@cumin1001>	conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics	[production]