__all__ SAL

251-300 of 10000 results (38ms)

2021-05-25 §
19:20	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:17	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
19:17	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:12	<twentyafterfour@deploy1002>	Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)	[production]
19:12	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
18:38	<twentyafterfour@deploy1002>	Started scap: testwikis wikis to 1.37.0-wmf.7	[production]
18:16	<razzi>	sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002	[analytics]
18:14	<razzi>	sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service	[analytics]
18:08	<krinkle@deploy1002>	Synchronized wmf-config/CommonSettings.php: I2ebe9674fb109f (duration: 00m 56s)	[production]
18:01	<razzi>	manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues	[analytics]
17:52	<razzi>	sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002	[analytics]
17:34	<Krinkle>	mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761.	[production]
17:30	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json	[production]
17:28	<razzi>	sudo systemctl restart refine_eventlogging_legacy	[analytics]
17:28	<razzi>	sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again	[analytics]
17:22	<effie>	disable puppet on mc2019 (for tests)	[production]
17:15	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json	[production]
17:14	<andrewbogott>	deleting old ingress controllers toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2	[toolsbeta]
17:13	<andrewbogott>	created two new ingress nodes, toolsbeta-test-k8s-ingress-4 and toolsbeta-test-k8s-ingress-5	[toolsbeta]
17:07	<razzi>	re-enabled puppet on an-masters and an-launcher	[analytics]
17:04	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave	[analytics]
17:03	<razzi>	sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
17:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json	[production]
16:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json	[production]
16:43	<razzi>	sudo systemctl restart hadoop-hdfs-namenode on an-master1001	[analytics]
16:38	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace	[analytics]
16:35	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter	[analytics]
16:28	<razzi>	sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
16:23	<razzi>	sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave	[analytics]
16:14	<bd808>	Closed #wikimedia-cloud-admin on f***node	[admin]
16:11	<bd808>	Closed #wikimedia-cloud-feed on f***node	[admin]
16:06	<razzi>	sudo systemctl restart hadoop-hdfs-namenode	[analytics]
15:52	<razzi>	checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace	[analytics]
15:51	<razzi>	enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter	[analytics]
15:36	<razzi>	disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again	[analytics]
15:35	<razzi>	re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues	[analytics]
15:32	<razzi>	disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet	[analytics]
15:19	<dcaro>	rebooted cloudvirt1020, starting VMs (T275893)	[admin]
15:13	<dcaro>	rebooting cloudvirt1020 (T275893)	[admin]
15:09	<dcaro>	turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020	[toolsbeta]
14:42	<dcaro>	taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it (T275893)	[admin]
14:39	<razzi>	stop puppet on an-launcher and stop hadoop-related timers	[analytics]
14:38	<wm-bot>	<bd808> Restart to fix irc connections. This is getting really boring.	[tools.bridgebot]
14:35	<dcaro>	taking down clouddb1002 replica for reboot of cloudvirt1020 (T275893)	[clouddb-services]
12:55	<urbanecm@deploy1002>	Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)	[production]
12:52	<urbanecm@deploy1002>	Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)	[production]
12:21	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json	[production]
12:07	<marostegui@cumin1001>	dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json	[production]
11:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json	[production]
11:26	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport	[production]