production SAL

3001-3050 of 10000 results (40ms)

2021-04-29 §
05:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json	[production]
04:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json	[production]
04:50	<marostegui@cumin1001>	dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json	[production]
04:44	<marostegui@cumin1001>	dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json	[production]
04:44	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE	[production]
04:41	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE	[production]
04:38	<marostegui@cumin1001>	dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json	[production]
04:38	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db1156 to dbctl T258361', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json	[production]
04:27	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json	[production]
02:59	<milimetric@deploy1002>	Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)	[production]
02:59	<milimetric@deploy1002>	Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job	[production]
02:58	<milimetric@deploy1002>	Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)	[production]
02:44	<milimetric@deploy1002>	Started deploy [analytics/refinery@740226b]: Hotfix for referrer job	[production]
01:44	<krinkle@deploy1002>	Synchronized wmf-config/mc.php: I5869b3c3ba4a (duration: 01m 08s)	[production]
01:23	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`	[production]
01:21	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)	[production]
01:21	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
01:20	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)	[production]
01:20	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
01:19	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)	[production]
01:19	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
01:19	<ryankemper>	T280382 Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)	[production]
01:18	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)	[production]
00:40	<tstarling@deploy1002>	Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: T281405 (duration: 01m 08s)	[production]
00:11	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`	[production]
00:06	<ryankemper>	T280382 `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`	[production]
2021-04-28 §
23:42	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
23:38	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE	[production]
23:36	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE	[production]
23:36	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE	[production]
23:34	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE	[production]
23:33	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE	[production]
23:32	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE	[production]
23:06	<dpifke@deploy1002>	Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)	[production]
23:06	<dpifke@deploy1002>	Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886	[production]
22:44	<dwisehaupt>	civiproxy revision changed to 99cecb924a - initial rollout of code for testing	[production]
22:26	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`	[production]
22:26	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
22:23	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
22:18	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`	[production]
22:18	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
22:18	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
22:15	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:49	<legoktm@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
21:49	<legoktm@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
21:47	<legoktm@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
21:46	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:44	<legoktm@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
21:44	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:41	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE	[production]