production SAL

1051-1100 of 10000 results (21ms)

2021-01-08 §
10:28	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13675 and previous config saved to /var/cache/conftool/dbconfig/20210108-102835-root.json	[production]
10:26	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13674 and previous config saved to /var/cache/conftool/dbconfig/20210108-102606-marostegui.json	[production]
10:01	<elukey>	restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well	[production]
10:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13673 and previous config saved to /var/cache/conftool/dbconfig/20210108-100040-root.json	[production]
09:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13672 and previous config saved to /var/cache/conftool/dbconfig/20210108-094535-root.json	[production]
09:30	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13671 and previous config saved to /var/cache/conftool/dbconfig/20210108-093032-root.json	[production]
09:30	<marostegui>	Restart mysql on db1115 (tendril/dbtree)	[production]
09:15	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13670 and previous config saved to /var/cache/conftool/dbconfig/20210108-091528-root.json	[production]
09:08	<moritzm>	installing libxstream-java security updates on Buster	[production]
09:01	<godog>	swift codfw-prod: more weight to ms-be20[58-61] - T269337	[production]
08:12	<marostegui>	Deploy schema change on s4 codfw master - T270187	[production]
07:57	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json	[production]
07:23	<marostegui>	Deploy schema change on s5 codfw master - T270187	[production]
06:33	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 T268742 ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json	[production]
06:18	<marostegui>	Deploy schema change on s2 codfw master - T270187	[production]
04:59	<mutante>	mw1266 - restart-php7.2-fpm	[production]
03:04	<ryankemper>	[wdqs deploy] Deploy complete, service is healthy. This is done.	[production]
02:35	<ryankemper>	[wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`	[production]
02:35	<ryankemper>	[wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
02:34	<ryankemper>	[wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
02:27	<ryankemper@deploy1001>	Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s)	[production]
02:15	<ryankemper>	[wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy	[production]
02:12	<ryankemper>	[wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy	[production]
02:09	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58	[production]
02:09	<ryankemper>	[wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding	[production]
01:29	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet	[production]
01:28	<mutante>	mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues	[production]
01:28	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet	[production]
01:28	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet	[production]
01:24	<mutante>	mw1266 - another buster appserver now serving traffic	[production]
01:24	<mutante>	mw1265 - raised weight to 25 like regular appservers (buster)	[production]
01:23	<dzahn@cumin1001>	conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet	[production]
01:18	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet	[production]
01:17	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet	[production]
01:17	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet	[production]
01:16	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet	[production]
01:12	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet	[production]
00:27	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE	[production]
00:25	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE	[production]
00:23	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE	[production]
00:23	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE	[production]
00:22	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE	[production]
00:21	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE	[production]
00:17	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE	[production]
00:15	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE	[production]
00:06	<jforrester@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Undeploy graphoid on enwiki T271495 (duration: 00m 57s)	[production]
2021-01-07 §
23:55	<mutante>	reimaging mw1267,mw1276,mw1277	[production]
23:28	<mutante>	reimaging mw1266	[production]
23:14	<andrew@deploy1001>	Finished deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 00s)	[production]
23:12	<andrew@deploy1001>	Started deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host	[production]