production SAL

1701-1750 of 10000 results (70ms)

2022-09-15 §
06:47	<marostegui@cumin1001>	dbctl commit (dc=all): 'Give some weight to db2096 T317842', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json	[production]
06:46	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2115 T317842', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json	[production]
06:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write T317842', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json	[production]
06:44	<marostegui>	Starting x1 codfw failover from db2115 to db2096 - T317842	[production]
06:40	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842	[production]
06:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set db2096 with weight 0 T317842', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json	[production]
06:40	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842	[production]
06:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json	[production]
06:14	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2105 T317839', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json	[production]
06:13	<marostegui@cumin1001>	dbctl commit (dc=all): 'Promote db2127 to s3 codfw T317839', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json	[production]
06:12	<marostegui>	Starting s3 codfw failover from db2105 to db2127 - T317839	[production]
06:03	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set db2127 with weight 0 T317839', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json	[production]
06:02	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839	[production]
06:02	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839	[production]
05:32	<marostegui@cumin1001>	END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662	[production]
05:32	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662	[production]
05:12	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662	[production]
05:12	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662	[production]
2022-09-14 §
22:08	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json	[production]
22:08	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance	[production]
22:08	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance	[production]
22:08	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance	[production]
22:07	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance	[production]
22:07	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json	[production]
21:52	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json	[production]
21:38	<dduvall@deploy1002>	Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)	[production]
21:37	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json	[production]
21:37	<dduvall@deploy1002>	Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002	[production]
21:36	<dduvall>	testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)	[production]
21:35	<dduvall@deploy1002>	Installation of scap version "4.19.1" completed for 561 hosts	[production]
21:35	<dduvall@deploy1002>	Installing scap version "4.19.1" for 561 hosts	[production]
21:34	<dduvall>	Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)	[production]
21:22	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json	[production]
20:47	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
20:47	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
20:47	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
20:47	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
20:44	<dancy@deploy1002>	Sync cancelled.	[production]
20:44	<dancy@deploy1002>	dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet	[production]
20:44	<dancy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
20:44	<dancy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
20:40	<dancy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
20:40	<dancy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
20:39	<dancy@deploy1002>	Started scap: testing	[production]
20:38	<dancy@deploy1002>	Synchronized php: group1 wikis to 1.40.0-wmf.1 refs T314190 (duration: 05m 49s)	[production]
20:34	<dancy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
20:34	<dancy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
20:34	<dancy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
20:33	<dancy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
20:32	<dancy@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs T314190	[production]