production SAL

8251-8300 of 10000 results (23ms)

2019-07-23 §
19:09	<elukey>	depool mw1261 for investigation	[production]
2019-07-22 §
18:59	<elukey>	repool scb1001 after pdu maintenance	[production]
18:00	<elukey>	arm keyholder on netmon1002 after power loss	[production]
17:35	<elukey>	depool scb1001 for PDU work T227140	[production]
09:47	<elukey>	failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - T228620	[production]
09:32	<elukey>	restart hadoop hdfs namenode on an-master1002 to apply new GC settings - T228620	[production]
07:54	<elukey>	sudo -i depool on elastic1046 - broken disk (srv partition not available) - T228606	[production]
07:40	<elukey>	systemctl reset-failed restbase on restbase1007->15 (decommed nodes)	[production]
06:23	<elukey>	restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity	[production]
2019-07-19 §
07:03	<elukey>	restart php-fpm on mw1330 - op-cache hit ratio low	[production]
07:01	<elukey>	depool wdqs2004 from all services (waiting for maintenance)	[production]
06:15	<elukey>	clear opcache on mwdebug*	[production]
2019-07-17 §
16:40	<elukey>	execute reprepro clearvanished on install1002 to clear buster-wikimedia\|thirdparty/amd-rocm (not used anymore)	[production]
06:59	<elukey>	apply mcrouter async replication to mw2224 - T225642	[production]
06:25	<elukey>	reboot analytics1072 as attempt to clear the megacli's config (and add a new disk)	[production]
06:20	<elukey>	sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache	[production]
2019-07-16 §
15:37	<elukey>	reboot analytics1072 as attempt to force the raid controller to set a drive failed - T226467	[production]
15:12	<elukey>	start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108	[production]
14:53	<elukey>	stop mariadb on db1107 to allow maintenance	[production]
14:53	<elukey>	stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance	[production]
09:24	<elukey>	apply mcrouter async replication settings to mw1276 - T225642	[production]
09:23	<elukey>	pool mw1261 back with mcrouter async replication settings - T225642	[production]
07:45	<elukey>	depool mw1261 to test mcrouter changes	[production]
2019-07-15 §
13:55	<elukey>	enable profile::base::firewall on notebook100[3,4]	[production]
2019-07-12 §
05:45	<elukey>	sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to clear opcache	[production]
2019-07-09 §
13:26	<elukey>	enable base::firewall on stat1007	[production]
10:39	<elukey>	update wikimedia-buster thirparty/amd-rocm component with upstream packages - T224723	[production]
09:13	<elukey>	enable per-server metrics on all prometheus-mcrouter-exporter(s) via puppet - T225059	[production]
08:49	<elukey>	upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-eqiad (cumin alias) via debdeploy - T225059	[production]
08:36	<elukey>	upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-codfw (cumin alias) via debdeploy - T225059	[production]
07:26	<elukey>	upload prometheus-mcrouter-exporter 0.0.0+git20190709-1 to stretch-wikimedia - T225059	[production]
2019-07-08 §
13:52	<elukey>	import AMD ROCm's Debian repo key (9386B48A1A693C5C) manually on install1002 - T224723	[production]
09:51	<elukey@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)	[production]
09:51	<elukey@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
07:00	<elukey>	add base::firewall to stat1004 - T170826	[production]
2019-07-05 §
13:44	<elukey>	roll restart of aqs on aqs100* to pick up new druid settings	[production]
2019-07-04 §
06:42	<elukey>	update puppet compiler's facts	[production]
2019-07-03 §
06:00	<elukey>	move the zookeeper puppet submodule into operations/puppet - T226466	[production]
2019-07-02 §
10:05	<elukey>	powercycle analytics1056 (soft lockups logged in the serial console, no ssh, no net connectivity)	[production]
2019-07-01 §
18:35	<elukey@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)	[production]
17:27	<elukey@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
15:37	<elukey@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)	[production]
15:36	<elukey@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
15:27	<elukey@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)	[production]
15:27	<elukey@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
10:04	<elukey>	remove burrow-analytics.service from kafkamon1001 (the analytics cluster has been decommed)	[production]
09:55	<elukey>	reboot kafkamon1001 with 4g of dedicated ram (was 8g) - T224988	[production]
09:54	<elukey>	reboot kafkamon2001 with 4g of dedicated ram (was 8g) - T224988	[production]
08:39	<elukey>	restart hadoop-yarn-nodemanager on all hadoop workers to pick up new jvm settings - T225296	[production]
2019-06-28 §
18:12	<elukey>	systemctl reset-failed kafka* units on kafka2001 (in decom phase)	[production]