production SAL

1051-1100 of 10000 results (148ms)

2024-08-20 §
12:20	<brouberol@deploy1003>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply	[production]
12:19	<dreamyjazz@deploy1003>	Finished scap sync-world: Backport for [[gerrit:1063926\|Fix DeletedContributions for user names containing spaces (T372444)]], [[gerrit:1064004\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]], [[gerrit:1064002\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]] (duration: 11m 38s)	[production]
12:15	<dreamyjazz@deploy1003>	dreamyjazz, samtar: Continuing with sync	[production]
12:12	<dreamyjazz@deploy1003>	dreamyjazz, samtar: Backport for [[gerrit:1063926\|Fix DeletedContributions for user names containing spaces (T372444)]], [[gerrit:1064004\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]], [[gerrit:1064002\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
12:08	<dreamyjazz@deploy1003>	Started scap sync-world: Backport for [[gerrit:1063926\|Fix DeletedContributions for user names containing spaces (T372444)]], [[gerrit:1064004\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]], [[gerrit:1064002\|Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)]]	[production]
09:42	<cgoubert@deploy1003>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
09:42	<cgoubert@deploy1003>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
09:42	<cgoubert@deploy1003>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
09:41	<cgoubert@deploy1003>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
09:41	<cgoubert@deploy1003>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
09:40	<cgoubert@deploy1003>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
09:40	<cgoubert@deploy1003>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
09:39	<cgoubert@deploy1003>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
09:39	<cgoubert@deploy1003>	helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
09:37	<cgoubert@deploy1003>	helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
09:36	<claime>	Deploying calico configuration for codfw row c/d lsw - 1062728	[production]
09:06	<brouberol@deploy1003>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply	[production]
09:06	<brouberol@deploy1003>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply	[production]
08:15	<aklapper@deploy1003>	rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.19 refs T366964	[production]
08:15	<klausman@deploy1003>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .	[production]
08:04	<klausman@deploy1003>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .	[production]
07:25	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards	[production]
07:18	<ayounsi@cumin1002>	END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890	[production]
07:14	<ayounsi@cumin1002>	START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890	[production]
06:48	<ayounsi@cumin1002>	END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890	[production]
06:47	<ayounsi@cumin1002>	START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890	[production]
06:43	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts	[production]
06:43	<ryankemper@cumin2002>	START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts	[production]
06:43	<ryankemper@deploy1003>	Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 05s)	[production]
06:42	<ryankemper@deploy1003>	Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host	[production]
06:40	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards	[production]
06:36	<ryankemper@deploy1003>	Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 13s)	[production]
06:36	<ryankemper@deploy1003>	Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host	[production]
05:22	<marostegui>	Deploy schema change on s1 eqiad old master db1184 dbmaint T367856	[production]
05:19	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depool db1184 T372524', diff saved to https://phabricator.wikimedia.org/P67395 and previous config saved to /var/cache/conftool/dbconfig/20240820-051948-marostegui.json	[production]
05:18	<marostegui@cumin1002>	dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T372524', diff saved to https://phabricator.wikimedia.org/P67394 and previous config saved to /var/cache/conftool/dbconfig/20240820-051843-marostegui.json	[production]
05:18	<marostegui@cumin1002>	dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T372524', diff saved to https://phabricator.wikimedia.org/P67393 and previous config saved to /var/cache/conftool/dbconfig/20240820-051821-root.json	[production]
05:18	<marostegui>	Starting s1 eqiad failover from db1184 to db1163 - T372524	[production]
05:17	<marostegui@cumin1002>	dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67392 and previous config saved to /var/cache/conftool/dbconfig/20240820-051726-marostegui.json	[production]
05:16	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change	[production]
05:16	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change	[production]
04:52	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524	[production]
04:52	<marostegui@cumin1002>	dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67391 and previous config saved to /var/cache/conftool/dbconfig/20240820-045212-root.json	[production]
04:52	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524	[production]
04:00	<mwpresync@deploy1003>	Pruned MediaWiki: 1.43.0-wmf.16 (duration: 00m 56s)	[production]
03:48	<mwpresync@deploy1003>	Finished scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964 (duration: 46m 32s)	[production]
03:02	<mwpresync@deploy1003>	Started scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964	[production]
00:21	<mutante>	previous message about prometheus can be ignored - race condition that solved itself on next puppet run	[production]
00:04	<mutante>	prometheus3003/prometheus1006 - are trying to use puppetserver1002 but get connection refused from puppetservre1001.eqiad.wmnet port 8140 - causing other puppet errors	[production]
2024-08-19 §
23:59	<mutante>	prometheus - puppet on prometheus hosts very slow - reason appears to be that /srv/prometheus is recursively managed by puppet but has ~ 20x more files than the default soft limit of 1000	[production]