2012-04-30
§
|
19:59 |
<notpeter> |
restarting nagios to get rid of some old checks |
[production] |
19:57 |
<Jeff_Green> |
payments cluster gets kernel updates and reboots |
[production] |
19:55 |
<logmsgbot_> |
reedy synchronizing Wikimedia installation... : Rebuiild l10n for 1.20wmf2 |
[production] |
19:49 |
<logmsgbot_> |
reedy synchronized wmf-config/ExtensionMessages-1.20wmf2.php 'Syncing file' |
[production] |
19:49 |
<logmsgbot_> |
reedy synchronized php-1.20wmf2/LocalSettings.php 'Pushing LocalSettings.php' |
[production] |
19:48 |
<paravoid> |
upgraded & rebooted ssl3001, ssl3002, ssl3003 |
[production] |
19:46 |
<logmsgbot_> |
reedy synchronizing Wikimedia installation... : Pushing out new symlinks etc, moving test2wiki to 1.20wmf2 |
[production] |
19:30 |
<logmsgbot_> |
reedy synchronized php-1.20wmf2 'Syncing 1.20wmf2 live hack revisions' |
[production] |
19:28 |
<logmsgbot_> |
reedy synchronized php-1.20wmf2 'Syncing 1.20wmf1 live hack revisions' |
[production] |
19:26 |
<logmsgbot_> |
reedy synchronized php-1.20wmf2 'Syncing 1.20wmf2 for deployment' |
[production] |
19:18 |
<Reedy> |
Syncing php-1.20wmf2 files from NFS to apaches. Likely to upset NFS (or the uplink for the switch nfs is on) for a little while... |
[production] |
19:14 |
<paravoid> |
rebooting ssl1004 |
[production] |
19:06 |
<paravoid> |
rebooting ssl1003 |
[production] |
19:00 |
<paravoid> |
rebooting ssl1002 |
[production] |
18:59 |
<notpeter> |
starting innobackupex from db1034 to db57 for new s2 slave |
[production] |
18:50 |
<paravoid> |
rebooting ssl1001 |
[production] |
18:42 |
<Jeff_Green> |
grosley gets new kernel + reboot |
[production] |
18:35 |
<Jeff_Green> |
aluminium gets kernel update, yayyyyyyy! |
[production] |
18:34 |
<paravoid> |
pooled back ssl1; depooling ssl3 and rebooting |
[production] |
18:29 |
<binasher> |
rebooting mw45 for kernel upgrade |
[production] |
18:27 |
<Jeff_Green> |
power cycling aluminium which faceplanted |
[production] |
18:22 |
<binasher> |
rebooting mw45 |
[production] |
18:21 |
<notpeter> |
rebuilding db57 again, this time with more correct raid level! |
[production] |
18:19 |
<logmsgbot_> |
asher synchronized wmf-config/db.php 'adding db59,60 to s1 with low weights' |
[production] |
18:16 |
<paravoid> |
depooled & rebooting ssl1 |
[production] |
18:09 |
<logmsgbot_> |
aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Sanity run after script changes. |
[production] |
18:00 |
<logmsgbot_> |
aaron synchronized multiversion |
[production] |
17:58 |
<logmsgbot_> |
reedy synchronized php-1.20wmf1/includes/MagicWord.php 'https://gerrit.wikimedia.org/r/6135' |
[production] |
17:44 |
<logmsgbot_> |
aaron synchronized wikiversions.cdb |
[production] |
17:43 |
<AaronSchulz> |
updating multiversion code |
[production] |
08:34 |
<mutante> |
reinstalling srv266 |
[production] |
08:08 |
<mutante> |
upgraded mw1,mw2,mw35 |
[production] |
07:59 |
<mutante> |
reinstalling srv206 |
[production] |
07:50 |
<mutante> |
upgrading mw36 |
[production] |
07:37 |
<apergos> |
powercycling srv266, had this message on mgmt console: Severity: Non Recoverable, SEL:CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted |
[production] |
07:22 |
<mutante> |
installing upgrades on srv212 |
[production] |
07:19 |
<apergos> |
reinstalled srv284, seems to be up now |
[production] |
07:17 |
<mutante> |
powercycled mw8 |
[production] |
02:14 |
<logmsgbot_> |
LocalisationUpdate completed (1.20wmf1) at Mon Apr 30 02:13:59 UTC 2012 |
[production] |
2012-04-29
§
|
20:13 |
<apergos> |
srv206 won't run puppet, see syslog, clearing out the yaml file didn't help, since it's not urgent I'm leaving it for tomorrow |
[production] |
19:51 |
<Ryan_Lane> |
depooling ssl3004 |
[production] |
19:51 |
<Ryan_Lane> |
removed the ipv6 addresses from maerlant and added them to ssl3001, then restarted nginx |
[production] |
19:50 |
<Ryan_Lane> |
repooling ssl3001 |
[production] |
19:46 |
<apergos> |
powercycled mw60, same reason as the rest |
[production] |
19:13 |
<apergos> |
power cycled mw48 and mw52 (hung just like the others) |
[production] |
18:05 |
<apergos> |
sll3002 and 3003 were rebooted and are the entire ssl esams pool right now |
[production] |
16:34 |
<apergos> |
powercycling the ssl300x.esams hosts. 212 days of uptime... (and 3001 had gone out to lunch) |
[production] |
12:34 |
<mutante> |
and finally mw1, so just leaving mw1102 and mw60 for having other issues for a while (->Nagios) |
[production] |
12:22 |
<mutante> |
check_all_memcached recovered, but still same treatment for mw10 and 11 (8 and 15h ago) |
[production] |
12:07 |
<mutante> |
powercycling mw30 |
[production] |