production SAL

651-700 of 10000 results (49ms)

2017-12-11 §
11:11	<_joe_>	restarting hhvm on mw1200, stuck in a kernel task	[production]
11:08	<jdrewniak@tin>	Synchronized portals: Wikimedia Portals Update: [[gerrit:397472\|Bumping portals to master (T128546)]] (duration: 00m 45s)	[production]
11:07	<jdrewniak@tin>	Synchronized portals/prod/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:397472\|Bumping portals to master (T128546)]] (duration: 00m 44s)	[production]
10:49	<ema>	cp4021: restart varnish-be due to mbox lag	[production]
10:04	<godog>	upgrade grafana to 4.6.2 on labmon1001 - T182294	[production]
10:00	<jynus>	stopping dbstore2001:s5 and dbstore1002 (s5) mysql replication in sync	[production]
09:28	<akosiaris>	upload scap_3.7.4-1 to apt.wikimedia.org/jessie-wikimedia/main	[production]
09:16	<gehel>	cleaning old cassandra dumps on maps-test2001 servers	[production]
09:15	<gehel>	cleaning up old postgres logs on maps-test2001	[production]
09:05	<elukey>	set notebook1002 as role::spare as prep step to reimage it to kafka1023	[production]
09:03	<jynus>	dropping multiple leftover files from db1102	[production]
08:52	<marostegui>	Stop replication in sync on db1034 and db1039 - T163190	[production]
08:12	<elukey>	powercycle ganeti1008 - all vms stuck, console com2 showed a ton of printks without a clear indicator of the root cause	[production]
07:49	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1034 - T182556 (duration: 00m 45s)	[production]
07:44	<_joe_>	restarting hhvm on mw1189,mw1229,mw1235,mw1282,mw1285,mw1315,mw1316, all stuck with a kernel hang	[production]
06:59	<_joe_>	restarted hhvm, nginx on mw1280, hanging kernel operations	[production]
06:45	<marostegui>	Deploy schema change on s2 db1060 with replication enabled, this will generate some lag on s2 on labs - T174569	[production]
06:45	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1060 - T174569 (duration: 00m 44s)	[production]
06:22	<marostegui>	Compress s6 on db1096 - T178359	[production]
06:21	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 to compress InnoDB there - T178359 (duration: 00m 45s)	[production]
02:43	<l10nupdate@tin>	scap sync-l10n completed (1.31.0-wmf.11) (duration: 09m 21s)	[production]
2017-12-10 §
20:33	<elukey>	execute restart-hhvm on mw1312 - hhvm stuck multiple times queueing requests	[production]
20:01	<elukey>	ran kafka preferred-replica-election for the kafka analytics cluster (1012->1022) to re-add kafka1012 to the kafka brokers acting as partition leaders (will spread the load in a better way)	[production]
2017-12-09 §
17:00	<apergos>	restarted hhvm on mw1276, the same old hang with the same old symptoms	[production]
16:10	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (take 4\!) (duration: 03m 01s)	[production]
16:07	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (take 4\!)	[production]
16:02	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 05m 58s)	[production]
15:56	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:55	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 00m 17s)	[production]
15:55	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:53	<awight@tin>	Finished deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity (duration: 00m 31s)	[production]
15:53	<awight@tin>	Started deploy [ores/deploy@1c0ede0]: Reducing ORES Celery log verbosity	[production]
15:53	<apergos>	did same on scb1002,3,4	[production]
15:48	<awight>	Making an emergency deployment to ORES logging config to reduce verbosity.	[production]
15:45	<apergos>	on scb1001 moved daemon.log out of the way, did "service rsyslog rotate", saved the last 5000 entries for use by ores team, removed the log	[production]
11:44	<apergos>	that server list: mw1278, 1277, 1226, 1234, 1230	[production]
11:42	<apergos>	restarted hhvm on api servers after lockup	[production]
11:19	<legoktm@tin>	Synchronized wmf-config/InitialiseSettings.php: Disable ORES in fawiki - T182354 (duration: 00m 45s)	[production]
00:11	<Jamesofur>	removed 2FA from EVinente after verification T182373	[production]
2017-12-08 §
23:23	<hashar>	force ran puppet on contint2001	[production]
22:15	<madhuvishy>	Kicked off rsync of /data/xmldatadumps/public to labstore1006 & 7	[production]
22:05	<smalyshev@tin>	Finished deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon (duration: 05m 55s)	[production]
21:59	<smalyshev@tin>	Started deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon	[production]
20:22	<aaron@tin>	Synchronized php-1.31.0-wmf.11/includes/Setup.php: a319c3e7ab61 - disable cpPosTime injection (duration: 00m 45s)	[production]
18:00	<reedy@tin>	Synchronized wmf-config/InitialiseSettings.php: Disable GlobalBlocking on fishbowl wikis (duration: 00m 45s)	[production]
16:23	<urandom>	starting cassandra, restbase1010 - T178177	[production]
16:22	<urandom>	disabling smart path, restbase1010, arrays 'b'...'e' - T178177	[production]
16:20	<urandom>	disabling smart path, restbase1010, array 'a' (canary) - T178177	[production]
16:15	<urandom>	shutting down cassandra, restbase1010 - T178177	[production]
15:35	<marostegui>	Fix dbstore1002 s5 replication	[production]