production SAL

5201-5250 of 10000 results (43ms)

2021-02-05 §
11:50	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet	[production]
11:44	<jayme@deploy1001>	Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 05m 50s)	[production]
11:44	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet	[production]
11:39	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
11:39	<ayounsi@cumin1001>	START - Cookbook sre.network.cf	[production]
11:39	<jayme@deploy1001>	Started deploy [docker-pkg/deploy@7257244]: (no justification provided)	[production]
11:38	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet	[production]
11:34	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet	[production]
11:30	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet	[production]
11:29	<vgutierrez>	restart acme-chief instances to catch up on kernel upgrades	[production]
11:27	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet	[production]
11:23	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet	[production]
11:22	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3002.esams.wmnet	[production]
11:16	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet	[production]
11:14	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet	[production]
11:08	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet	[production]
11:06	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet	[production]
10:56	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet	[production]
10:53	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14222 and previous config saved to /var/cache/conftool/dbconfig/20210205-105345-root.json	[production]
10:38	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14221 and previous config saved to /var/cache/conftool/dbconfig/20210205-103841-root.json	[production]
10:32	<godog>	swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837	[production]
10:27	<vgutierrez@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)	[production]
10:27	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
10:23	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14220 and previous config saved to /var/cache/conftool/dbconfig/20210205-102338-root.json	[production]
10:08	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14219 and previous config saved to /var/cache/conftool/dbconfig/20210205-100834-root.json	[production]
10:06	<gehel>	repooling wdqs1013 - catched up on lag	[production]
09:53	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 10%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14218 and previous config saved to /var/cache/conftool/dbconfig/20210205-095331-root.json	[production]
09:45	<dcausse>	reloading categories from scratch on wdqs1010	[production]
09:38	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1075 (re)pooling @ 5%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14217 and previous config saved to /var/cache/conftool/dbconfig/20210205-093827-root.json	[production]
08:46	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1094 T273710', diff saved to https://phabricator.wikimedia.org/P14214 and previous config saved to /var/cache/conftool/dbconfig/20210205-084625-marostegui.json	[production]
08:29	<dcausse>	reloading categories from scratch on wdqs1009	[production]
07:55	<gehel>	cleanup of left over ttl dumps on wdqs1009 and wdqs1010	[production]
07:47	<gehel>	depooling wdqs1013 and restarting blazegraph	[production]
07:28	<oblivian@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
07:28	<oblivian@cumin1001>	START - Cookbook sre.network.cf	[production]
06:36	<marostegui>	Stop MySQL on db1075 to clone db1157 T258361	[production]
06:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1075 T258361', diff saved to https://phabricator.wikimedia.org/P14212 and previous config saved to /var/cache/conftool/dbconfig/20210205-063554-marostegui.json	[production]
03:42	<aaron@deploy1001>	Synchronized wmf-config/mc.php: af5b0effb5e88ac4ca4a06c2c409d303ec405305 (duration: 01m 06s)	[production]
03:34	<aaron@deploy1001>	Synchronized php-1.36.0-wmf.27/includes/libs/rdbms: 4b386661a9820a002b43bfcef3e18241ea883870 (duration: 01m 12s)	[production]
02:03	<Krinkle>	krinkle@mwmaint1002 Prune globalimagelinks references on s4 database for the deleted ukwikimedia wiki, ref T218170.	[production]
01:01	<ebernhardson@deploy1001>	Finished deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec (duration: 01m 12s)	[production]
00:59	<ebernhardson@deploy1001>	Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec	[production]
00:36	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet	[production]
00:35	<legoktm>	enabled remote IPMI access on mw1349.mgmt.eqiad.wmnet and mw1380.mgmt.eqiad.wmnet	[production]
00:24	<ebernhardson@deploy1001>	Finished deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well (duration: 02m 43s)	[production]
00:21	<ebernhardson@deploy1001>	Started deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well	[production]
2021-02-04 §
23:59	<ebernhardson@deploy1001>	Finished deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 (duration: 01m 06s)	[production]
23:58	<ebernhardson@deploy1001>	Started deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3	[production]
23:26	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE	[production]
23:24	<legoktm@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE	[production]