production SAL

3651-3700 of 10000 results (35ms)

2015-12-21 §
19:22	<godog>	reimage restbase1004	[production]
19:14	<paravoid>	powercycling mw1011	[production]
19:11	<paravoid>	rolling restart of hhvm on the eqiad jobrunners	[production]
18:47	<jynus>	common-sync: Copying to mw1016.eqiad.wmnet from tin.eqiad.wmnet	[production]
18:35	<ori>	correction: previous log message was for mw1015, not mw1017	[production]
18:27	<ori>	mw1017: enabled jemalloc profiling, restarted hhvm, now running hhvm-collect-heaps	[production]
17:48	<akosiaris>	restarted hhvm on mw1012.eqiad.wmnet	[production]
16:57	<thcipriani>	timeout on sync-file to mw1016.eqiad.wmnet	[production]
16:56	<thcipriani@tin>	Synchronized php-1.27.0-wmf.9/extensions/Popups/Popups.hooks.php: SWAT: Use ExtensionRegistry to determine whether TextExtracts is installed [[gerrit:260346]] (duration: 02m 48s)	[production]
16:34	<jynus>	sync-common to mw1085	[production]
16:26	<jynus>	powercycling mw1085.eqiad.wmnet	[production]
16:22	<thcipriani>	mw1085.eqiad.wmnet times out on SSH connection	[production]
16:19	<godog>	reboot restbase1007, load through the roof	[production]
16:18	<thcipriani@tin>	Synchronized php-1.27.0-wmf.9/extensions/CentralNotice/resources/subscribing/ext.centralNotice.geoIP.js: SWAT: Update CentralNotice [[gerrit:260316]] (duration: 03m 03s)	[production]
16:08	<godog>	depool restbase1007	[production]
16:01	<apergos>	jessie packages for salt with local patches deployed on restbase1001, looks fine but just in case.	[production]
15:44	<godog>	adding new 1TB disk to restbase1007	[production]
14:22	<andrewbogott>	disabling puppet on labnet1002 for dnsmasq tests	[production]
14:07	<MaxSem>	me and yurik are nuking old maps data and reimporting planet	[production]
13:46	<jynus>	extending online s2-master data disk by +100GB	[production]
13:15	<akosiaris>	disabled puppet on maps-test2001 and commented out osmupdater crontab entry until we fix the sync process	[production]
11:02	<jynus>	emergency restart of db1047's mysql	[production]
09:54	<jynus>	reenabling semisync replication on s3	[production]
09:07	<godog>	stop cassandra on restbase1004, decomissioned	[production]
02:29	<l10nupdate@tin>	ResourceLoader cache refresh completed at Mon Dec 21 02:29:51 UTC 2015 (duration 6m 47s)	[production]
02:23	<mwdeploy@tin>	sync-l10n completed (1.27.0-wmf.9) (duration: 09m 45s)	[production]
02:20	<andrewbogott>	disabling puppet on labnet1002 to mess with dnsmasq	[production]
01:44	<andrewbogott>	disabled puppet on holmium and labservices1001 to control roll-out of https://gerrit.wikimedia.org/r/#/c/260037/	[production]
2015-12-20 §
23:24	<Reedy>	Katie and Jeff paged about bellatrix	[production]
18:46	<andrewbogott>	graceful restart of zuul as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart	[production]
18:31	<andrewbogott>	restarting stuck Jenkins	[production]
17:47	<reedy@tin>	Purged l10n cache for 1.27.0-wmf.6	[production]
17:11	<godog>	depool mw1228, reported ro fs	[production]
15:53	<reedy@tin>	Synchronized README: noop (duration: 00m 32s)	[production]
15:50	<Reedy>	reedy@tin Purged l10n cache for 1.27.0-wmf.6 (hanging due to mw1228 issue)	[production]
15:42	<Reedy>	mw1228 reporting readonly fs	[production]
15:41	<reedy@tin>	Purged l10n cache for 1.27.0-wmf.7	[production]
09:00	<godog>	powercycle ms-be2019, xfs lockup	[production]
02:28	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sun Dec 20 02:28:49 UTC 2015 (duration 6m 54s)	[production]
02:21	<mwdeploy@tin>	sync-l10n completed (1.27.0-wmf.9) (duration: 08m 59s)	[production]
2015-12-19 §
21:55	<_joe_>	restarted zotero on sca1001, various OOM messages	[production]
20:48	<gwicke>	restbase1004: `systemctl mask cassandra` in preparation for the decommission finishing	[production]
19:49	<akosiaris>	killed gmond on db2036. it was clearly misbehaving and running since Jan 02. db2036 was not listed on the ganglia web interface. killing the orphaned process and restarting seems to have fixed it	[production]
18:54	<akosiaris>	scheduled maintenance of s3 slave lag on db2036, db2043, db2050, db2057 (all of db2018's family that pages) to effectively silence pages while debugging. Check is flapping since 15:00 UTC today	[production]
15:14	<krenair@tin>	Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/259611/ - noop for prod, other than making icinga stop complaining (duration: 00m 31s)	[production]
10:07	<hashar>	CI jobs for MediaWiki were broken because of cssjanus dependency. Should be fixed once mw/core https://gerrit.wikimedia.org/r/#/c/260169/ lands	[production]
02:28	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sat Dec 19 02:28:56 UTC 2015 (duration 6m 53s)	[production]
02:22	<mwdeploy@tin>	sync-l10n completed (1.27.0-wmf.9) (duration: 08m 53s)	[production]
01:01	<gwicke>	entire restbase cluster: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase$NODE--vg-{srv,var}	[production]
00:49	<gwicke>	restbase1008: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase1008--vg-srv	[production]