production SAL

4851-4900 of 10000 results (43ms)

2017-12-11 §
08:12	<elukey>	powercycle ganeti1008 - all vms stuck, console com2 showed a ton of printks without a clear indicator of the root cause	[production]
07:49	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1034 - T182556 (duration: 00m 45s)	[production]
07:44	<_joe_>	restarting hhvm on mw1189,mw1229,mw1235,mw1282,mw1285,mw1315,mw1316, all stuck with a kernel hang	[production]
06:59	<_joe_>	restarted hhvm, nginx on mw1280, hanging kernel operations	[production]
06:45	<marostegui>	Deploy schema change on s2 db1060 with replication enabled, this will generate some lag on s2 on labs - T174569	[production]
06:45	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1060 - T174569 (duration: 00m 44s)	[production]
06:22	<marostegui>	Compress s6 on db1096 - T178359	[production]
06:21	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 to compress InnoDB there - T178359 (duration: 00m 45s)	[production]
02:43	<l10nupdate@tin>	scap sync-l10n completed (1.31.0-wmf.11) (duration: 09m 21s)	[production]
2017-12-10 §
20:33	<elukey>	execute restart-hhvm on mw1312 - hhvm stuck multiple times queueing requests	[production]
20:01	<elukey>	ran kafka preferred-replica-election for the kafka analytics cluster (1012->1022) to re-add kafka1012 to the kafka brokers acting as partition leaders (will spread the load in a better way)	[production]
2017-12-09 §
17:00	<apergos>	restarted hhvm on mw1276, the same old hang with the same old symptoms	[production]
16:10	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (take 4\!) (duration: 03m 01s)	[production]
16:07	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (take 4\!)	[production]
16:02	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 05m 58s)	[production]
15:56	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:55	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 00m 17s)	[production]
15:55	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:53	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 00m 31s)	[production]
15:53	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:53	<apergos>	did same on scb1002,3,4	[production]
15:48	<awight>	Making an emergency deployment to ORES logging config to reduce verbosity.	[production]
15:45	<apergos>	on scb1001 moved daemon.log out of the way, did "service rsyslog rotate", saved the last 5000 entries for use by ores team, removed the log	[production]
11:44	<apergos>	that server list: mw1278, 1277, 1226, 1234, 1230	[production]
11:42	<apergos>	restarted hhvm on api servers after lockup	[production]
11:19	<legoktm@tin>	Synchronized wmf-config/InitialiseSettings.php: Disable ORES in fawiki - T182354 (duration: 00m 45s)	[production]
00:11	<Jamesofur>	removed 2FA from EVinente after verification T182373	[production]
2017-12-08 §
23:23	<hashar>	force ran puppet on contint2001	[production]
22:15	<madhuvishy>	Kicked off rsync of /data/xmldatadumps/public to labstore1006 & 7	[production]
22:05	<smalyshev@tin>	Finished deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon (duration: 05m 55s)	[production]
21:59	<smalyshev@tin>	Started deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon	[production]
20:22	<aaron@tin>	Synchronized php-1.31.0-wmf.11/includes/Setup.php: a319c3e7ab61 - disable cpPosTime injection (duration: 00m 45s)	[production]
18:00	<reedy@tin>	Synchronized wmf-config/InitialiseSettings.php: Disable GlobalBlocking on fishbowl wikis (duration: 00m 45s)	[production]
16:23	<urandom>	starting cassandra, restbase1010 - T178177	[production]
16:22	<urandom>	disabling smart path, restbase1010, arrays 'b'...'e' - T178177	[production]
16:20	<urandom>	disabling smart path, restbase1010, array 'a' (canary) - T178177	[production]
16:15	<urandom>	shutting down cassandra, restbase1010 - T178177	[production]
15:35	<marostegui>	Fix dbstore1002 s5 replication	[production]
15:28	<gehel@tin>	Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 00m 03s)	[production]
15:28	<gehel@tin>	Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003	[production]
15:08	<gehel@tin>	Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 02m 08s)	[production]
15:06	<gehel@tin>	Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003	[production]
15:05	<gehel@tin>	Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 00m 42s)	[production]
15:05	<gehel@tin>	Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003	[production]
14:39	<gehel@tin>	Finished deploy [tilerator/deploy@e52ea1d]: testing new tilerator packaging on maps-test2003 (duration: 02m 34s)	[production]
14:36	<gehel@tin>	Started deploy [tilerator/deploy@e52ea1d]: testing new tilerator packaging on maps-test2003	[production]
11:45	<elukey>	updated prometheus-druid-exporter on druid* to 0.6	[production]
11:39	<elukey>	upload prometheus-druid-exporter 0.6 to stretch/jessie wikimedia	[production]
06:52	<marostegui>	Fix labsdb1004 replication broken	[production]
06:43	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Fully pool db1099:3311 - T178359 (duration: 00m 55s)	[production]