production SAL

1001-1050 of 10000 results (91ms)

2024-06-26 §
18:25	<sukhe@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"	[production]
18:24	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65490 and previous config saved to /var/cache/conftool/dbconfig/20240626-182355-marostegui.json	[production]
18:23	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance	[production]
18:23	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance	[production]
18:23	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65489 and previous config saved to /var/cache/conftool/dbconfig/20240626-182333-marostegui.json	[production]
18:23	<sukhe@cumin1002>	START - Cookbook sre.dns.netbox	[production]
18:19	<brett@cumin2002>	START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye	[production]
18:17	<jhuneidi@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956	[production]
18:14	<sukhe>	# etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360	[production]
18:12	<brett@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet	[production]
18:08	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65488 and previous config saved to /var/cache/conftool/dbconfig/20240626-180824-marostegui.json	[production]
18:07	<xcollazo@deploy1002>	Finished deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance. (duration: 00m 39s)	[production]
18:06	<xcollazo@deploy1002>	Started deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance.	[production]
17:59	<sukhe>	sudo cumin -b10 "A:cp-text" "run-puppet-agent"	[production]
17:58	<sukhe>	sudo cumin -b1 -s30 "A:cp-text" "run-puppet-agent"	[production]
17:53	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65487 and previous config saved to /var/cache/conftool/dbconfig/20240626-175317-marostegui.json	[production]
17:51	<ottomata>	disabling varnishkafka-eventlogging and varnish /beacon/event handling on ache text nodes. Puppet is disabled on all cache text, will test a few at a time first. - T238230	[production]
17:46	<sukhe>	disable puppet in A:cp-text	[production]
17:43	<brett@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet	[production]
17:40	<brett@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye	[production]
17:39	<sukhe>	sudo cumin "A:dnsbox" "run-puppet-agent"	[production]
17:38	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65486 and previous config saved to /var/cache/conftool/dbconfig/20240626-173810-marostegui.json	[production]
17:37	<mnz@deploy1002>	Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 11s)	[production]
17:37	<mnz@deploy1002>	Started deploy [airflow-dags/research@5121748]: (no justification provided)	[production]
17:29	<xcollazo@deploy1002>	Finished deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34] (duration: 02m 54s)	[production]
17:26	<xcollazo@deploy1002>	Started deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34]	[production]
17:26	<xcollazo@deploy1002>	Finished deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34] (duration: 04m 12s)	[production]
17:22	<xcollazo@deploy1002>	Started deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34]	[production]
17:17	<mnz@deploy1002>	Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)	[production]
17:17	<mnz@deploy1002>	Started deploy [airflow-dags/research@1996a7a]: (no justification provided)	[production]
17:16	<sukhe>	re-enable puppet on A:cp-text	[production]
17:14	<ladsgroup@deploy1002>	Finished scap: Backport for [[gerrit:1049982\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]] (duration: 08m 52s)	[production]
17:09	<ladsgroup@deploy1002>	ladsgroup: Continuing with sync	[production]
17:08	<ladsgroup@deploy1002>	ladsgroup: Backport for [[gerrit:1049982\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd	[production]
17:06	<brett@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage	[production]
17:05	<ladsgroup@deploy1002>	Started scap: Backport for [[gerrit:1049982\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988\|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984\|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]]	[production]
17:03	<brett@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage	[production]
17:01	<xcollazo@deploy1002>	Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 09m 16s)	[production]
16:52	<xcollazo@deploy1002>	Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]	[production]
16:52	<sukhe>	disable puppet on A:cp-text	[production]
16:50	<eevans@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033	[production]
16:50	<eevans@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033	[production]
16:44	<mnz@deploy1002>	Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)	[production]
16:44	<mnz@deploy1002>	Started deploy [airflow-dags/research@1996a7a]: (no justification provided)	[production]
16:39	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync	[production]
16:38	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync	[production]
16:30	<brett@cumin2002>	START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye	[production]
16:27	<xcollazo@deploy1002>	Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 00m 29s)	[production]
16:27	<xcollazo@deploy1002>	Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]	[production]
16:25	<brett@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet	[production]