production SAL

6401-6450 of 10000 results (63ms)

2022-03-17 §
07:12	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json	[production]
07:11	<ryankemper>	[WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on)	[production]
07:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json	[production]
07:04	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance	[production]
07:04	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance	[production]
07:04	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance	[production]
07:04	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance	[production]
06:57	<ryankemper>	[WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22	[production]
06:57	<ryankemper>	[WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently.	[production]
06:56	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json	[production]
06:54	<ryankemper>	[WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service`	[production]
06:53	<ryankemper>	[WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`	[production]
06:50	<elukey>	restart blazegraph on wdqs2004	[production]
06:46	<elukey>	kill remaining hanging processes for ppchelko and accraze on an-test-client1001 to allow users offboard (puppet broken)	[production]
06:41	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json	[production]
06:26	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22739 and previous config saved to /var/cache/conftool/dbconfig/20220317-062648-root.json	[production]
06:15	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance	[production]
06:15	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance	[production]
06:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22738 and previous config saved to /var/cache/conftool/dbconfig/20220317-061144-root.json	[production]
04:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22737 and previous config saved to /var/cache/conftool/dbconfig/20220317-040634-marostegui.json	[production]
04:06	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance	[production]
04:06	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance	[production]
02:57	<andrew@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye	[production]
02:07	<andrew@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye	[production]
02:07	<andrew@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye	[production]
01:11	<andrew@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye	[production]
2022-03-16 §
23:52	<tzatziki>	Removing two files for legal compliance	[production]
21:17	<cjming>	end running skin update preference maintenance script	[production]
20:52	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
20:40	<urbanecm@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: [no-op] 8efa537: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion (T303240) (duration: 00m 53s)	[production]
20:38	<robh@cumin1001>	START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
20:35	<urbanecm@deploy1002>	Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: 9ba157b: Add insert option for update skin preferences script (T299104) (duration: 00m 50s)	[production]
20:34	<urbanecm@deploy1002>	Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: ebfc516: Add script to update vector skin preferences (T299104) (duration: 00m 51s)	[production]
20:32	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
20:24	<pt1979@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye	[production]
20:13	<robh@cumin1001>	START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
20:13	<urbanecm@deploy1002>	Synchronized docroot/noc/db.php: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 3/3) (duration: 00m 49s)	[production]
20:12	<urbanecm@deploy1002>	Synchronized multiversion/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 2/3) (duration: 00m 50s)	[production]
20:11	<urbanecm@deploy1002>	Synchronized wmf-config/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 1/3) (duration: 00m 50s)	[production]
19:22	<otto@deploy1002>	Finished deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided) (duration: 07m 50s)	[production]
19:14	<otto@deploy1002>	Started deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided)	[production]
18:32	<sukhe>	running: homer "cr-drmrs" commit "Gerrit 771359: Set up BGP peering in drmrs for Wikidough."	[production]
18:09	<aqu@deploy1002>	Finished deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f] (duration: 00m 08s)	[production]
18:09	<aqu@deploy1002>	Started deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f]	[production]
18:02	<aqu@deploy1002>	Finished deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f] (duration: 00m 08s)	[production]
18:02	<aqu@deploy1002>	Started deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f]	[production]
18:00	<razzi@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time	[production]
18:00	<razzi@cumin1001>	START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time	[production]
17:36	<dancy@deploy1002>	Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:771001\|mwscript: Support --force-version flag (T303878)]] (duration: 00m 57s)	[production]
17:21	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls	[production]