2012-04-30
§
|
18:29 |
<binasher> |
rebooting mw45 for kernel upgrade |
[production] |
18:27 |
<Jeff_Green> |
power cycling aluminium which faceplanted |
[production] |
18:22 |
<binasher> |
rebooting mw45 |
[production] |
18:21 |
<notpeter> |
rebuilding db57 again, this time with more correct raid level! |
[production] |
18:19 |
<logmsgbot_> |
asher synchronized wmf-config/db.php 'adding db59,60 to s1 with low weights' |
[production] |
18:16 |
<paravoid> |
depooled & rebooting ssl1 |
[production] |
18:09 |
<logmsgbot_> |
aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Sanity run after script changes. |
[production] |
18:00 |
<logmsgbot_> |
aaron synchronized multiversion |
[production] |
17:58 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/includes/MagicWord.php 'https://gerrit.wikimedia.org/r/6135' |
[production] |
17:44 |
<logmsgbot_> |
aaron synchronized wikiversions.cdb |
[production] |
17:43 |
<AaronSchulz> |
updating multiversion code |
[production] |
08:34 |
<mutante> |
reinstalling srv266 |
[production] |
08:08 |
<mutante> |
upgraded mw1,mw2,mw35 |
[production] |
07:59 |
<mutante> |
reinstalling srv206 |
[production] |
07:50 |
<mutante> |
upgrading mw36 |
[production] |
07:37 |
<apergos> |
powercycling srv266, had this message on mgmt console: Severity: Non Recoverable, SEL:CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted |
[production] |
07:22 |
<mutante> |
installing upgrades on srv212 |
[production] |
07:19 |
<apergos> |
reinstalled srv284, seems to be up now |
[production] |
07:17 |
<mutante> |
powercycled mw8 |
[production] |
02:14 |
<logmsgbot_> |
LocalisationUpdate completed (1.20wmf1) at Mon Apr 30 02:13:59 UTC 2012 |
[production] |
2012-04-29
§
|
20:13 |
<apergos> |
srv206 won't run puppet, see syslog, clearing out the yaml file didn't help, since it's not urgent I'm leaving it for tomorrow |
[production] |
19:51 |
<Ryan_Lane> |
depooling ssl3004 |
[production] |
19:51 |
<Ryan_Lane> |
removed the ipv6 addresses from maerlant and added them to ssl3001, then restarted nginx |
[production] |
19:50 |
<Ryan_Lane> |
repooling ssl3001 |
[production] |
19:46 |
<apergos> |
powercycled mw60, same reason as the rest |
[production] |
19:13 |
<apergos> |
power cycled mw48 and mw52 (hung just like the others) |
[production] |
18:05 |
<apergos> |
sll3002 and 3003 were rebooted and are the entire ssl esams pool right now |
[production] |
16:34 |
<apergos> |
powercycling the ssl300x.esams hosts. 212 days of uptime... (and 3001 had gone out to lunch) |
[production] |
12:34 |
<mutante> |
and finally mw1, so just leaving mw1102 and mw60 for having other issues for a while (->Nagios) |
[production] |
12:22 |
<mutante> |
check_all_memcached recovered, but still same treatment for mw10 and 11 (8 and 15h ago) |
[production] |
12:07 |
<mutante> |
powercycling mw30 |
[production] |
02:56 |
<paravoid> |
rebooting ssl2 (has 214 days uptime) |
[production] |
02:47 |
<paravoid> |
powercycled ssl3 |
[production] |
02:14 |
<logmsgbot_> |
LocalisationUpdate completed (1.20wmf1) at Sun Apr 29 02:13:58 UTC 2012 |
[production] |
2012-04-28
§
|
22:53 |
<Reedy> |
Job queue logs on gdash seem to have stopped on the 26th... |
[production] |
22:29 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/includes/EditPage.php 'https://gerrit.wikimedia.org/r/6088' |
[production] |
21:52 |
<logmsgbot_> |
reedy synchronized wmf-config/CommonSettings.php |
[production] |
21:51 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php |
[production] |
21:12 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php |
[production] |
21:10 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php |
[production] |
21:09 |
<logmsgbot_> |
reedy synchronized common/php-1.20wmf1/extensions/cldr/LanguageNames.body.php 'more debugging' |
[production] |
20:51 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/extensions/cldr/LanguageNames.body.php 'Add debugging' |
[production] |
20:49 |
<logmsgbot_> |
reedy synchronized wmf-config/CommonSettings.php 'Add debuglog group for language code not being a string' |
[production] |
19:04 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/includes/ExternalEdit.php 'https://gerrit.wikimedia.org/r/6077' |
[production] |
19:03 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/includes/api/ApiParse.php 'https://gerrit.wikimedia.org/r/6076' |
[production] |
02:24 |
<Ryan_Lane> |
rebooting all mediawiki boxes that have uptimes affected by the bug are being rebooted at 8 minute intervals |
[production] |
02:14 |
<logmsgbot_> |
LocalisationUpdate completed (1.20wmf1) at Sat Apr 28 02:14:14 UTC 2012 |
[production] |
01:33 |
<paravoid> |
powecycled mw29 |
[production] |
01:21 |
<paravoid> |
powercycled mw38 |
[production] |
00:17 |
<notpeter> |
db12 is sooooo sloooooow, starting innobackupex from db1017 to db60 for new s1 slave |
[production] |