production SAL

4601-4650 of 10000 results (36ms)

2020-09-07 §
08:18	<jayme@deploy2001>	helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
08:10	<jayme@deploy2001>	helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
08:03	<marostegui>	Compress InnoDB on s8 eqiad master (db1109) - T232446	[production]
05:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json	[production]
04:56	<marostegui>	Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki T254462	[production]
04:53	<marostegui>	Deploy schema change on db1109 (eqiad wikidata master) - T256685	[production]
2020-09-06 §
19:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json	[production]
08:20	<elukey>	powercycle mw1360 (mgmt console available, network errors while running anything)	[production]
08:04	<elukey@puppetmaster1001>	conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet	[production]
08:01	<elukey>	executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)	[production]
2020-09-05 §
00:23	<foks>	removing 2 files for legal compliance	[production]
2020-09-04 §
22:15	<ryankemper>	wdqs deploy complete, service is healthy	[production]
21:54	<ryankemper>	`sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`	[production]
21:52	<ryankemper>	`sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
21:49	<ryankemper@deploy1001>	Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)	[production]
21:37	<ryankemper>	Tests on canary `wdqs1003` passing, beginning full wdqs deploy	[production]
21:36	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47	[production]
21:31	<ryankemper>	`ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`	[production]
21:06	<mutante>	apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962)	[production]
21:02	<mutante>	apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo	[production]
20:59	<mutante>	apt2001 - sudo apt-get autoremove	[production]
20:51	<mutante>	apt2001 - apt-get remove --purge libnginx* and run puppet to replace nginx-full with nginx-light (T261962)	[production]
20:43	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:41	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:39	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:38	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:38	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:36	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:36	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
20:35	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
20:34	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:32	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:31	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:31	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:30	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:30	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:05	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:04	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
20:03	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:01	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:01	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
20:00	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:59	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:59	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
19:57	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:57	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:22	<mutante>	Icinga - ACKing with sticky - alerts on test and dev hosts	[production]
18:10	<milimetric@deploy1001>	Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)	[production]
18:02	<milimetric@deploy1001>	Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing	[production]
10:31	<elukey@cumin1001>	END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)	[production]