production SAL

4251-4300 of 10000 results (30ms)

2012-04-30 §
18:27	<Jeff_Green>	power cycling aluminium which faceplanted	[production]
18:22	<binasher>	rebooting mw45	[production]
18:21	<notpeter>	rebuilding db57 again, this time with more correct raid level!	[production]
18:19	<logmsgbot_>	asher synchronized wmf-config/db.php 'adding db59,60 to s1 with low weights'	[production]
18:16	<paravoid>	depooled & rebooting ssl1	[production]
18:09	<logmsgbot_>	aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Sanity run after script changes.	[production]
18:00	<logmsgbot_>	aaron synchronized multiversion	[production]
17:58	<logmsgbot_>	reedy synchronized php-1.20wmf1/includes/MagicWord.php 'https://gerrit.wikimedia.org/r/6135'	[production]
17:44	<logmsgbot_>	aaron synchronized wikiversions.cdb	[production]
17:43	<AaronSchulz>	updating multiversion code	[production]
08:34	<mutante>	reinstalling srv266	[production]
08:08	<mutante>	upgraded mw1,mw2,mw35	[production]
07:59	<mutante>	reinstalling srv206	[production]
07:50	<mutante>	upgrading mw36	[production]
07:37	<apergos>	powercycling srv266, had this message on mgmt console: Severity: Non Recoverable, SEL:CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted	[production]
07:22	<mutante>	installing upgrades on srv212	[production]
07:19	<apergos>	reinstalled srv284, seems to be up now	[production]
07:17	<mutante>	powercycled mw8	[production]
02:14	<logmsgbot_>	LocalisationUpdate completed (1.20wmf1) at Mon Apr 30 02:13:59 UTC 2012	[production]
2012-04-29 §
20:13	<apergos>	srv206 won't run puppet, see syslog, clearing out the yaml file didn't help, since it's not urgent I'm leaving it for tomorrow	[production]
19:51	<Ryan_Lane>	depooling ssl3004	[production]
19:51	<Ryan_Lane>	removed the ipv6 addresses from maerlant and added them to ssl3001, then restarted nginx	[production]
19:50	<Ryan_Lane>	repooling ssl3001	[production]
19:46	<apergos>	powercycled mw60, same reason as the rest	[production]
19:13	<apergos>	power cycled mw48 and mw52 (hung just like the others)	[production]
18:05	<apergos>	sll3002 and 3003 were rebooted and are the entire ssl esams pool right now	[production]
16:34	<apergos>	powercycling the ssl300x.esams hosts. 212 days of uptime... (and 3001 had gone out to lunch)	[production]
12:34	<mutante>	and finally mw1, so just leaving mw1102 and mw60 for having other issues for a while (->Nagios)	[production]
12:22	<mutante>	check_all_memcached recovered, but still same treatment for mw10 and 11 (8 and 15h ago)	[production]
12:07	<mutante>	powercycling mw30	[production]
02:56	<paravoid>	rebooting ssl2 (has 214 days uptime)	[production]
02:47	<paravoid>	powercycled ssl3	[production]
02:14	<logmsgbot_>	LocalisationUpdate completed (1.20wmf1) at Sun Apr 29 02:13:58 UTC 2012	[production]
2012-04-28 §
22:53	<Reedy>	Job queue logs on gdash seem to have stopped on the 26th...	[production]
22:29	<logmsgbot_>	reedy synchronized php-1.20wmf1/includes/EditPage.php 'https://gerrit.wikimedia.org/r/6088'	[production]
21:52	<logmsgbot_>	reedy synchronized wmf-config/CommonSettings.php	[production]
21:51	<logmsgbot_>	reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php	[production]
21:12	<logmsgbot_>	reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php	[production]
21:10	<logmsgbot_>	reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php	[production]
21:09	<logmsgbot_>	reedy synchronized common/php-1.20wmf1/extensions/cldr/LanguageNames.body.php 'more debugging'	[production]
20:51	<logmsgbot_>	reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php 'Add debugging'	[production]
20:49	<logmsgbot_>	reedy synchronized wmf-config/CommonSettings.php 'Add debuglog group for language code not being a string'	[production]
19:04	<logmsgbot_>	reedy synchronized php-1.20wmf1/includes/ExternalEdit.php 'https://gerrit.wikimedia.org/r/6077'	[production]
19:03	<logmsgbot_>	reedy synchronized php-1.20wmf1/includes/api/ApiParse.php 'https://gerrit.wikimedia.org/r/6076'	[production]
02:24	<Ryan_Lane>	rebooting all mediawiki boxes that have uptimes affected by the bug are being rebooted at 8 minute intervals	[production]
02:14	<logmsgbot_>	LocalisationUpdate completed (1.20wmf1) at Sat Apr 28 02:14:14 UTC 2012	[production]
01:33	<paravoid>	powecycled mw29	[production]
01:21	<paravoid>	powercycled mw38	[production]
00:17	<notpeter>	db12 is sooooo sloooooow, starting innobackupex from db1017 to db60 for new s1 slave	[production]
2012-04-27 §
22:15	<paravoid>	upgraded ssl4 to nginx 0.7.65-5wmf1 and added it back to the pool	[production]