__all__ SAL

951-1000 of 10000 results (44ms)

2021-05-25 §
21:58	<razzi@cumin1001>	END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)	[production]
21:58	<razzi@cumin1001>	START - Cookbook sre.hadoop.roll-restart-masters	[production]
21:13	<razzi@cumin1001>	END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)	[production]
21:13	<razzi@cumin1001>	START - Cookbook sre.hadoop.roll-restart-masters	[production]
21:13	<razzi@cumin1001>	END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)	[production]
21:13	<razzi@cumin1001>	START - Cookbook sre.hadoop.roll-restart-workers	[production]
20:40	<razzi@cumin1001>	END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)	[production]
20:28	<razzi@cumin1001>	START - Cookbook sre.hadoop.roll-restart-workers	[production]
20:00	<twentyafterfour@deploy1002>	rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.7	[production]
19:20	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:17	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
19:17	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:12	<twentyafterfour@deploy1002>	Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)	[production]
19:12	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
18:38	<twentyafterfour@deploy1002>	Started scap: testwikis wikis to 1.37.0-wmf.7	[production]
18:16	<razzi>	sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002	[analytics]
18:14	<razzi>	sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service	[analytics]
18:08	<krinkle@deploy1002>	Synchronized wmf-config/CommonSettings.php: I2ebe9674fb109f (duration: 00m 56s)	[production]
18:01	<razzi>	manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues	[analytics]
17:52	<razzi>	sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002	[analytics]
17:34	<Krinkle>	mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761.	[production]
17:30	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json	[production]
17:28	<razzi>	sudo systemctl restart refine_eventlogging_legacy	[analytics]
17:28	<razzi>	sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again	[analytics]
17:22	<effie>	disable puppet on mc2019 (for tests)	[production]
17:15	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json	[production]
17:14	<andrewbogott>	deleting old ingress controllers toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2	[toolsbeta]
17:13	<andrewbogott>	created two new ingress nodes, toolsbeta-test-k8s-ingress-4 and toolsbeta-test-k8s-ingress-5	[toolsbeta]
17:07	<razzi>	re-enabled puppet on an-masters and an-launcher	[analytics]
17:04	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave	[analytics]
17:03	<razzi>	sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
17:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json	[production]
16:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json	[production]
16:43	<razzi>	sudo systemctl restart hadoop-hdfs-namenode on an-master1001	[analytics]
16:38	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace	[analytics]
16:35	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter	[analytics]
16:28	<razzi>	sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
16:23	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave	[analytics]
16:14	<bd808>	Closed #wikimedia-cloud-admin on f***node	[admin]
16:11	<bd808>	Closed #wikimedia-cloud-feed on f***node	[admin]
16:06	<razzi>	sudo systemctl restart hadoop-hdfs-namenode	[analytics]
15:52	<razzi>	checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace	[analytics]
15:51	<razzi>	enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter	[analytics]
15:36	<razzi>	disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again	[analytics]
15:35	<razzi>	re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues	[analytics]
15:32	<razzi>	disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet	[analytics]
15:19	<dcaro>	rebooted cloudvirt1020, starting VMs (T275893)	[admin]
15:13	<dcaro>	rebooting cloudvirt1020 (T275893)	[admin]
15:09	<dcaro>	turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020	[toolsbeta]
14:42	<dcaro>	taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it (T275893)	[admin]