production SAL

351-400 of 10000 results (31ms)

2020-12-17 §
11:27	<godog>	bounce apache2 on grafana1002	[production]
11:26	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE	[production]
11:24	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE	[production]
11:22	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE	[production]
11:21	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE	[production]
11:21	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE	[production]
11:20	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE	[production]
11:20	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE	[production]
11:18	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE	[production]
11:16	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE	[production]
11:16	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE	[production]
11:10	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)	[production]
11:08	<jbond@cumin1001>	START - Cookbook sre.hosts.reboot-single	[production]
10:50	<elukey@cumin1001>	END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001	[production]
10:45	<elukey@cumin1001>	START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001	[production]
10:21	<jbond42>	updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872	[production]
09:57	<vgutierrez>	repool ats-tls on cp5011	[production]
09:00	<marostegui>	Sanitize s1 and s5 on db1154 T268742	[production]
08:30	<godog>	swift codfw-prod: more weight to ms-be20[58-61] - T269337	[production]
07:49	<ryankemper>	[wdqs deploy] (wdqs deploy complete)	[production]
07:19	<marostegui>	Stop mysql on db1082 to clone db1154	[production]
07:19	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json	[production]
07:18	<elukey>	reboot an-airflow1001 for kernel upgrades	[production]
07:08	<elukey>	update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706	[production]
07:08	<ryankemper>	[wdqs] depooled `wdqs1013` while it catches up on lag	[production]
07:06	<ryankemper>	[wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`	[production]
07:05	<ryankemper>	[wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
07:05	<ryankemper>	[wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
07:04	<ryankemper@deploy1001>	Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s)	[production]
06:54	<ryankemper>	[wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet	[production]
06:53	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56	[production]
06:53	<ryankemper>	[wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy	[production]
06:52	<kart_>	Updated cxserver to 2020-12-16-164911-production (T234220, T269437)	[production]
06:52	<kart_>	Updated cxserver to 2020-12-16-164911-production (T234220, T234220)	[production]
06:22	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool es1013 for decommissioning T268436', diff saved to https://phabricator.wikimedia.org/P13562 and previous config saved to /var/cache/conftool/dbconfig/20201217-062249-marostegui.json	[production]
06:22	<kartik@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
06:19	<kartik@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
06:17	<kartik@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .	[production]
06:13	<marostegui@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)	[production]
06:05	<marostegui@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
05:56	<marostegui>	Stop mysql on db1106 to clone db1154	[production]
05:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1106 for cloning db1154:3311 T268742 ', diff saved to https://phabricator.wikimedia.org/P13560 and previous config saved to /var/cache/conftool/dbconfig/20201217-055556-marostegui.json	[production]
01:35	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE	[production]
01:33	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE	[production]
01:01	<twentyafterfour>	preparing to update phabricator translations	[production]
00:22	<mutante>	running puppet on mw2266, mw2370, mw2354	[production]
2020-12-16 §
23:56	<bstorm>	bootstrapped meta_p database for the new s7 replicas T269427	[production]
20:12	<marxarelli>	group1 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415)	[production]
20:06	<dduvall@deploy1001>	Synchronized php: group1 wikis to 1.36.0-wmf.22 (duration: 01m 01s)	[production]
20:05	<legoktm>	added myself to the ops LDAP group	[production]