production SAL

4401-4450 of 10000 results (23ms)

2020-06-27 §
19:05	<mutante>	rebooting gerrit1001	[production]
18:58	<mutante>	rebooting gerrit2001	[production]
18:49	<hashar>	Enabling beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/	[production]
18:35	<qchris@deploy1001>	Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001 (duration: 00m 10s)	[production]
18:34	<qchris@deploy1001>	Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001	[production]
18:27	<qchris@deploy1001>	Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001 (duration: 00m 08s)	[production]
18:27	<qchris@deploy1001>	Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001	[production]
17:25	<hashar>	Disabled beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/	[production]
17:19	<qchris>	Stopping gerrit on gerrit1001 for the Gerrit upgrade	[production]
17:14	<qchris>	Duplicating reviewdb changes so we get a cheap and quick rollback	[production]
17:11	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
17:11	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
17:11	<qchris>	Disabling puppet on gerrit1001 for Gerrit upgrades + data migrations	[production]
17:11	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
17:11	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
17:07	<qchris>	Starting Gerrit upgrade to v3.2.2-98-g98d827eaa3	[production]
15:44	<qchris@deploy1001>	Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test) (duration: 00m 08s)	[production]
15:44	<qchris@deploy1001>	Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test)	[production]
13:03	<qchris@deploy1001>	Finished deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test) (duration: 00m 08s)	[production]
13:03	<qchris@deploy1001>	Started deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test)	[production]
2020-06-26 §
18:42	<robh>	all ulsfo onsite work completed as of 30 minutes ago	[production]
17:52	<robh>	msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes T256300	[production]
17:52	<robh>	msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes	[production]
17:11	<robh>	msw work in ulsfo via T256300	[production]
10:24	<ema>	pool 5006 T256449	[production]
10:22	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11677 and previous config saved to /var/cache/conftool/dbconfig/20200626-102248-marostegui.json	[production]
10:22	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11676 and previous config saved to /var/cache/conftool/dbconfig/20200626-102201-marostegui.json	[production]
10:03	<ema>	cp2039: restart purged T256444	[production]
09:57	<ema>	cp2037: restart purged T256444	[production]
09:55	<ema>	cp1087: restart purged T256444	[production]
09:46	<ema>	cp2033: restart purged T256444	[production]
09:38	<akosiaris>	move the sessionstore eqiad pods back to the dedicated sessionstore nodes	[production]
09:37	<akosiaris@deploy1001>	helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .	[production]
09:35	<akosiaris>	move the sessionstore codfw pods back to the dedicated sessionstore nodes	[production]
09:35	<akosiaris@deploy1001>	helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .	[production]
09:08	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P11675 and previous config saved to /var/cache/conftool/dbconfig/20200626-090813-marostegui.json	[production]
08:58	<jynus@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
08:56	<jynus@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
08:33	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1088', diff saved to https://phabricator.wikimedia.org/P11674 and previous config saved to /var/cache/conftool/dbconfig/20200626-083319-marostegui.json	[production]
08:25	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
08:22	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P11673 and previous config saved to /var/cache/conftool/dbconfig/20200626-082242-marostegui.json	[production]
08:20	<ayounsi@cumin1001>	START - Cookbook sre.dns.netbox	[production]
08:20	<ayounsi@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
08:05	<akosiaris@cumin1001>	conftool action : set/pooled=yes; selector: name=kubernetes.*.wmnet	[production]
08:04	<akosiaris@cumin1001>	conftool action : set/weight=10; selector: name=kubernetes.*.wmnet	[production]
08:04	<akosiaris>	pool all new kubernetes nodes in LVS T252185 T256236	[production]
07:57	<ayounsi@cumin1001>	START - Cookbook sre.dns.netbox	[production]
07:44	<volans>	force rebooted cp5006 that is unresponsive (after having depooled it) - T256449	[production]
07:42	<volans@cumin1001>	conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet	[production]
06:40	<tstarling@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: add cache-cookies log channel (duration: 00m 59s)	[production]