production SAL

1-50 of 10000 results (16ms)

2021-06-03 §
23:41	<reedy@deploy1002>	Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s)	[production]
23:40	<reedy@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s)	[production]
23:33	<mutante>	installing OS on fresh VM doh5001	[production]
23:30	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE	[production]
23:28	<ryankemper@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE	[production]
23:09	<thcipriani@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686\|Restrict changetags to sysops and bots on meta]] T283625 (duration: 00m 58s)	[production]
22:41	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
22:39	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
22:39	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
22:36	<ryankemper>	T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after	[production]
22:36	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)	[production]
22:35	<ryankemper>	T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`	[production]
22:28	<robh@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
22:15	<robh@cumin1001>	START - Cookbook sre.dns.netbox	[production]
21:55	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
20:54	<shdubsh>	restart kafka on kafka-logging to take new retention config	[production]
20:47	<sbassett>	Deployed security patch for T282932	[production]
20:37	<ebernhardson>	restart mjolnir-kafka-bulk-daemon on search-loader[12]001	[production]
20:35	<ebernhardson@deploy1002>	Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)	[production]
20:34	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
20:34	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
20:34	<ebernhardson@deploy1002>	Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container	[production]
20:34	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
20:34	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer	[production]
19:58	<mutante>	[mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts	[production]
19:56	<mutante>	[mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service	[production]
19:41	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org	[production]
19:41	<dzahn@cumin1001>	START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org	[production]
19:39	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)	[production]
19:37	<dzahn@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org	[production]
19:34	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs	[production]
19:33	<mutante>	[deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 - icinga-wm> RECOVERY - Check systemd state on deneb is OK	[production]
19:33	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
19:32	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
19:32	<mutante>	[deneb:~] $ sudo systemctl start docker-reporter-releng-images	[production]
19:28	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
19:27	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer	[production]
19:27	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
19:27	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
19:23	<dzahn@cumin1001>	START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org	[production]
19:14	<mutante>	install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456	[production]
19:05	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE	[production]
19:03	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE	[production]
19:03	<ryankemper@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE	[production]
19:01	<ryankemper@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE	[production]
18:52	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)	[production]
18:51	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter	[production]
18:46	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
18:46	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
18:39	<ryankemper>	[WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)	[production]