2009-07-10
§
|
11:08 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'more' |
[production] |
10:58 |
<Tim> |
installed memcached on srv200-srv209 |
[production] |
10:51 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more' |
[production] |
10:48 |
<Tim> |
mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned |
[production] |
10:37 |
<Tim> |
installed memcached on srv120, srv121, srv122, srv123 |
[production] |
10:32 |
<Tim> |
found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. |
[production] |
2009-07-09
§
|
23:56 |
<RoanKattouw> |
Rebooted prototype around 16:30, got stuck around 15:30 |
[production] |
21:43 |
<Rob> |
srv35 (test.wikipedia.org) is not posting, i think its dead jim. |
[production] |
21:35 |
<Rob> |
decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly |
[production] |
20:04 |
<Rob> |
removed decommissioned servers from node groups, getting error on syncing up nagios. |
[production] |
20:03 |
<Rob> |
updated dns for new apache servers |
[production] |
19:54 |
<Rob> |
decommissioned all old apaches in rack pmtpa b2 |
[production] |
16:22 |
<Tim> |
creating mhrwiki (bug 19515) |
[production] |
13:27 |
<domas> |
db13 controller battery failed, s2 needs master switch eventually |
[production] |
2009-07-07
§
|
19:06 |
<Fred> |
adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460) |
[production] |
17:34 |
<Fred> |
found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically... |
[production] |
17:05 |
<Fred> |
ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them). |
[production] |
15:02 |
<robh> |
synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia' |
[production] |
03:37 |
<Fred> |
fixing ganglia. Expect disruption |
[production] |
00:27 |
<tomaszf> |
starting six worker threads for xml snapshots |
[production] |
00:12 |
<Fred> |
srv142 and srv55 will need manual power-cycle. |
[production] |
00:10 |
<Fred> |
Rolling reboot has finally completed. |
[production] |
2009-07-03
§
|
12:51 |
<andrew> |
synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.' |
[production] |
11:45 |
<mark> |
Kicked iris so it would boot |
[production] |
10:11 |
<andrew> |
synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system |
[production] |
10:00 |
<Andrew> |
testing |
[production] |
05:51 |
<Tim> |
restarted squid instances on sq28 |
[production] |
05:47 |
<Tim> |
restarted squid instances on sq2 |
[production] |
05:46 |
<Tim> |
started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage |
[production] |
05:35 |
<Tim> |
restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about. |
[production] |
2009-07-02
§
|
21:38 |
<Rob> |
sq24 wont accept ssh, depooling. |
[production] |
21:34 |
<Rob> |
rebooting sq21 |
[production] |
21:26 |
<Rob> |
ran changes to push dns back to normal scenario |
[production] |
19:52 |
<mark> |
Power outage at esams, moving traffic |
[production] |
19:44 |
<Andrew> |
Knams down, Rob is looking into it |
[production] |
19:41 |
<Andrew> |
Reports of problems from Europe |
[production] |
19:25 |
<Andrew> |
running sync-common-all to deploy mobileRedirect.php to fix hcatlin's mobile redirect/cookie bug |
[production] |
19:22 |
<andrew> |
synchronized live-1.5/mobileRedirect.php |
[production] |
17:15 |
<mark> |
Rebooted srv159 |
[production] |
16:13 |
<Fred> |
shutting 217 back down as it is not supposed to be up due to faulty timer causing issues. |
[production] |
16:12 |
<Fred> |
rebooted srv217. Was unpingable. |
[production] |
14:09 |
<Andrew> |
Started sending updates of spam.log to Project Honeypot folks every 5 minutes, in my crontab on hume. |
[production] |