production SAL

3401-3450 of 10000 results (49ms)

2021-06-03 §
19:32	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
19:32	<mutante>	[deneb:~] $ sudo systemctl start docker-reporter-releng-images	[production]
19:28	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
19:27	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer	[production]
19:27	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
19:27	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
19:23	<dzahn@cumin1001>	START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org	[production]
19:14	<mutante>	install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456	[production]
19:05	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE	[production]
19:03	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE	[production]
19:03	<ryankemper@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE	[production]
19:01	<ryankemper@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE	[production]
18:52	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)	[production]
18:51	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter	[production]
18:46	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`	[production]
18:46	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`	[production]
18:39	<ryankemper>	[WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)	[production]
18:37	<ryankemper>	[WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)	[production]
18:37	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729	[production]
18:37	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729	[production]
18:28	<mutante>	temp. disabling puppet on install* servers. switching nginx to light variant (T164456)	[production]
18:16	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)	[production]
18:16	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter	[production]
17:49	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE	[production]
17:47	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE	[production]
17:47	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE	[production]
17:45	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE	[production]
17:37	<brennen>	gitlab1001: re-running install-gitlab-server.sh	[production]
17:16	<urandom>	remove dropped Cassandra keyspace snapshots -- T258414	[production]
16:55	<ejegg>	updated payments-wiki from 6fac77f60e to 7be0534b91	[production]
16:23	<ayounsi@cumin1001>	START - Cookbook sre.dns.netbox	[production]
15:49	<topranks>	Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.	[production]
15:27	<papaul>	pdu replacement complete	[production]
15:25	<moritzm>	upgrading gitlab to 13.11.5	[production]
15:08	<papaul>	disconnect ps2-d8-codfw for replacement	[production]
14:55	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
14:54	<topranks>	Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.	[production]
14:23	<moritzm>	installing nginx security updates on buster	[production]
14:12	<moritzm>	installing postgresql-9.6 security updates	[production]
13:55	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
13:25	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
13:18	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
13:17	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
13:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json	[production]
12:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json	[production]
12:32	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json	[production]
12:30	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json	[production]
12:17	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json	[production]
12:15	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json	[production]
12:12	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json	[production]