production SAL

2351-2400 of 10000 results (36ms)

2020-12-17 §
13:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1087 to clone db1154:3318 add db1092 as vslow,dump service for s8 T268742 ', diff saved to https://phabricator.wikimedia.org/P13571 and previous config saved to /var/cache/conftool/dbconfig/20201217-130101-marostegui.json	[production]
12:56	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1089 (re)pooling @ 25%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13570 and previous config saved to /var/cache/conftool/dbconfig/20201217-125624-root.json	[production]
12:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13569 and previous config saved to /var/cache/conftool/dbconfig/20201217-125556-root.json	[production]
12:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'Change db1089 weights', diff saved to https://phabricator.wikimedia.org/P13568 and previous config saved to /var/cache/conftool/dbconfig/20201217-125535-marostegui.json	[production]
12:54	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1106 after cloning db1154:3311 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13567 and previous config saved to /var/cache/conftool/dbconfig/20201217-125446-marostegui.json	[production]
12:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13566 and previous config saved to /var/cache/conftool/dbconfig/20201217-124052-root.json	[production]
12:36	<jbond42>	disable puppet fleet wide for condif master vhost change	[production]
12:23	<matthiasmullie>	EU backport+config window done	[production]
12:23	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE	[production]
12:22	<mlitn@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: f3a50cb06: Enable ContentTranslation as default tool for ceb, km, mg, tg and yi WPs (duration: 01m 02s)	[production]
12:21	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE	[production]
12:17	<mlitn@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: a29fec312: Add Wikidocumentaries campaign for ContentTranslation (duration: 01m 02s)	[production]
12:07	<mlitn@deploy1001>	Synchronized wmf-config/SearchSettingsForSDC.php: 68ac6fa61: Media Search: Remove license map from config (duration: 01m 04s)	[production]
11:38	<kart_>	Updated cxserver to 2020-12-17-111820-production (T262192)	[production]
11:36	<kartik@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
11:34	<kartik@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
11:32	<kartik@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .	[production]
11:27	<godog>	bounce apache2 on grafana1002	[production]
11:26	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE	[production]
11:24	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE	[production]
11:22	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE	[production]
11:21	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE	[production]
11:21	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE	[production]
11:20	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE	[production]
11:20	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE	[production]
11:18	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE	[production]
11:16	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE	[production]
11:16	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE	[production]
11:10	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)	[production]
11:08	<jbond@cumin1001>	START - Cookbook sre.hosts.reboot-single	[production]
10:50	<elukey@cumin1001>	END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001	[production]
10:45	<elukey@cumin1001>	START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001	[production]
10:21	<jbond42>	updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872	[production]
09:57	<vgutierrez>	repool ats-tls on cp5011	[production]
09:00	<marostegui>	Sanitize s1 and s5 on db1154 T268742	[production]
08:30	<godog>	swift codfw-prod: more weight to ms-be20[58-61] - T269337	[production]
07:49	<ryankemper>	[wdqs deploy] (wdqs deploy complete)	[production]
07:19	<marostegui>	Stop mysql on db1082 to clone db1154	[production]
07:19	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json	[production]
07:18	<elukey>	reboot an-airflow1001 for kernel upgrades	[production]
07:08	<elukey>	update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706	[production]
07:08	<ryankemper>	[wdqs] depooled `wdqs1013` while it catches up on lag	[production]
07:06	<ryankemper>	[wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`	[production]
07:05	<ryankemper>	[wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
07:05	<ryankemper>	[wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
07:04	<ryankemper@deploy1001>	Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s)	[production]
06:54	<ryankemper>	[wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet	[production]
06:53	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56	[production]
06:53	<ryankemper>	[wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy	[production]
06:52	<kart_>	Updated cxserver to 2020-12-16-164911-production (T234220, T269437)	[production]