2009-12-06
§
|
22:50 |
<domas> |
and yes, it was db30 |
[production] |
22:17 |
<domas> |
s/bank8/bank9/ |
[production] |
22:15 |
<domas> |
bank13 & bank8 MCE warnings ( http://p.defau.lt/?KWMB35Z13ysXpN6IHcca9A ) |
[production] |
22:13 |
<midom> |
synchronized php-1.5/wmf-config/db.php |
[production] |
20:22 |
<mark> |
Puppetized NTP configuration on the Solaris servers |
[production] |
19:37 |
<mark> |
Installed CSW pkgutil and puppet on ms4, updated it on ms5 |
[production] |
18:00 |
<apergos> |
resending incrementals for last three months from ms1 to ms7 with -I to get the intermediate snaps. using netcat, running in screen as root on both hosts |
[production] |
15:54 |
<apergos> |
cleaned up / on ms1, was out of space (tossed some old files from /root) |
[production] |
01:20 |
<mark> |
Disabled xinetd and extdist crontab on zwinger |
[production] |
00:40 |
<mark> |
synchronized php-1.5/wmf-config/CommonSettings.php 'Moved svn-invoker (ExtensionDistributor) from zwinger to fenari' |
[production] |
00:27 |
<mark> |
sq27 is flooding syslog; placed temporary firewall entry for syslog packets on nfs1 |
[production] |
2009-12-04
§
|
23:30 |
<atglenn> |
started netcat of the bulk of the data from ms5 to ms7. running in screen as root on both hosts. |
[production] |
23:21 |
<atglenn> |
started ncat of (small piece of) image date from ms5 to ms7, running in screen as root on both hosts |
[production] |
20:47 |
<Rob> |
which doesnt work, damn. |
[production] |
20:47 |
<Rob> |
got sick of racktables.wikimedia.org not redirecting correctly, put in a rewrite for non ssl connections to ssl |
[production] |
20:24 |
<Fred> |
fixed nrpe on db20 and db7 |
[production] |
20:13 |
<root> |
ran sync-common-all |
[production] |
20:12 |
<Rob> |
running sync-common-all to update configuration for support of flaggedrevs on plwiktionary |
[production] |
19:20 |
<Rob> |
srv144 removed from node groups & pybal, nagios resynced. |
[production] |
19:19 |
<Rob> |
srv144 is out of warranty and rebooting randomly, decommissioning. |
[production] |
19:05 |
<Fred> |
finished setup of srv245. |
[production] |
19:02 |
<Rob> |
srv126 removed from node groups and lvs. nagios restarted to exclude it. |
[production] |
19:01 |
<Rob> |
srv126 refuses to even post when benched, out of warranty, slating for immediate decommissioning |
[production] |
19:00 |
<Rob> |
srv144 reinstalling with a single hard disk, no more raid1 |
[production] |
18:50 |
<Rob> |
swapped primary srv144 drive with old decommissioned spare. reinstalling OS, will reinstall packages and get online later. |
[production] |
18:45 |
<Rob> |
sq22 back online, all drives nominal, rebuilding cache and ensuring it is in rotation |
[production] |
18:41 |
<Rob> |
rebooted sq22 |
[production] |
18:38 |
<Rob> |
rebooted srv144 and srv126 |
[production] |
18:36 |
<Rob> |
srv245 package install failed. I do not have time to tinker with it while in the DC, I have other things that require my physical access to the machines. Leaving it alone for now to work on remotely. |
[production] |
18:28 |
<Rob> |
srv245 OS installed, setting up wikimedia-task-appserver |
[production] |
18:06 |
<Rob> |
srv245 was sitting idle with no OS, depooled from apaches. reinstalling system. |
[production] |
17:57 |
<Rob> |
rebooted srv83 per fred |
[production] |
17:35 |
<Fred> |
removed srv83 from the nodelist since it was causing ddsh to never finish executing. |
[production] |
17:26 |
<Fred> |
fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it. |
[production] |
16:50 |
<rainman-sr> |
stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine |
[production] |
16:48 |
<rainman-sr> |
searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around |
[production] |
16:14 |
<Rob> |
srv245 down and unresponsive, rebooting |
[production] |
16:12 |
<Rob> |
sq43's replacement disk is also bad (talk about bad luck), placing rma with dell. system will remain powered down for now. |
[production] |
15:55 |
<Rob> |
sq43 isn't seeing a replaced disk, rebooting and troubleshooting |
[production] |
15:33 |
<domas> |
'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs |
[production] |
15:13 |
<Rob> |
after tinkering with it with domas, it appears rebuild is indeed automatic. db21 rebuilding raid array |
[production] |
15:09 |
<Rob> |
db21 bad disk swapped out, rebuild should be automatic |
[production] |
14:57 |
<Rob> |
sq14 back up, rebuilding its cache |
[production] |
14:54 |
<Rob> |
sq13 primary disk dead, out of warranty |
[production] |
14:53 |
<Rob> |
swapping sdc in sq13 and sq14 to bring sq14 back online |
[production] |