__all__ SAL

551-600 of 10000 results (20ms)

2020-11-16 §
11:49	<hnowlan@cumin1001>	START - Cookbook sre.cassandra.roll-restart	[production]
11:44	<dcaro>	etcd5 member added, creating instance toolsbeta-test-k8s-etcd6 and adding to the etcd cluster (T267140)	[toolsbeta]
11:27	<dcaro>	Creating instance toolsbeta-test-k8s-etcd5 and adding to the etcd cluster (T267140)	[toolsbeta]
11:13	<moritzm>	installing poppler security updates	[production]
10:46	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
10:46	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
10:45	<dcaro@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
10:45	<dcaro@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
10:44	<dcaro@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
10:44	<dcaro@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
10:41	<klausman>	about to update stat1008 to new kernel and rocm	[analytics]
09:31	<gehel@cumin2001>	END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)	[production]
09:31	<gehel@cumin2001>	START - Cookbook sre.elasticsearch.force-shard-allocation	[production]
09:13	<joal>	Rerun webrequest-refine for hours 0 to 6 of day 2020-11-16 - This will prevent webrequest-druid-daily to get loaded with incoherent data due to bucketing change	[analytics]
08:45	<joal>	Correct webrequest job directly on HDFS and restart webrequest bundle oozie job	[analytics]
08:43	<joal>	Kill webrequest bundle to correct typo	[analytics]
08:39	<godog>	centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc	[production]
08:35	<moritzm>	installing codemirror-js security updates	[production]
08:32	<XioNoX>	asw-c-codfw> request system power-off member 7 - T267865	[production]
08:31	<joal>	Restart webrequest bundle oozie job with update	[analytics]
08:31	<joal>	Restart webrequest bun	[analytics]
08:25	<joal>	Deploying refinery onto HDFS	[analytics]
08:24	<joal@deploy1001>	Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s)	[production]
08:23	<joal@deploy1001>	Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb]	[production]
08:23	<joal@deploy1001>	Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s)	[production]
08:13	<joal>	Deploying refinery with scap	[analytics]
08:13	<joal@deploy1001>	Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb]	[production]
08:08	<XioNoX>	asw-c-codfw> request system power-off member 7 - T267865	[production]
08:01	<joal>	Repair wmf.webrequest hive table partitions	[analytics]
08:01	<joal>	Recreate wmf.webrequest hive table with new partitioning	[analytics]
08:00	<joal>	Drop webrequest table	[analytics]
07:55	<joal>	Kill webrequest-bundle oozie job for table update	[analytics]
06:35	<marostegui>	Stop replication on s3 codfw master (db2105) for MCR schema change deployment T238966	[production]
06:14	<marostegui>	Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - T261717	[production]
06:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - T261717', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json	[production]
06:02	<marostegui>	Restart mysql on db1115 (tendril/dbtree) due to memory usage	[production]
00:55	<shdubsh>	re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - T267865	[production]
00:19	<elukey>	run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - T267865	[production]
00:09	<elukey>	sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - T267865	[production]
2020-11-15 §
22:10	<cdanis>	restart some purgeds in ulsfo as well T267865 T267867	[production]
22:03	<cdanis>	T267867 T267865 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged'	[production]
19:45	<bstorm>	restarting the import to clouddb-toolsdb-03 with --max-allowed-packet=1G to rule out that as a problem entirely T266587	[clouddb-services]
19:36	<bstorm>	set max_allowed_package to 64MB on clouddb-toolsdb-03 T266587	[clouddb-services]
14:00	<cdanis>	powercycling ms-be1022 via mgmt	[production]
11:21	<arturo>	icinga downtime cloudbackup2002 for 48h (T267865)	[admin]
11:21	<aborrero@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
11:21	<aborrero@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
11:12	<vgutierrez>	depooling lvs2007, lvs2010 taking over text traffic on codfw - T267865	[production]
10:00	<elukey>	cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1	[production]
09:57	<elukey>	restart purged on cp4028 (consumer stuck due to kafka-main2003 down)	[production]