2009-07-10
§
|
15:08 |
<Rob> |
upgraded blog and techblog to wordpress 2.8.1 |
[production] |
13:58 |
<midom> |
synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\\!' |
[production] |
12:40 |
<Tim> |
prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch |
[production] |
11:08 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'more' |
[production] |
10:58 |
<Tim> |
installed memcached on srv200-srv209 |
[production] |
10:51 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more' |
[production] |
10:48 |
<Tim> |
mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned |
[production] |
10:37 |
<Tim> |
installed memcached on srv120, srv121, srv122, srv123 |
[production] |
10:32 |
<Tim> |
found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. |
[production] |
2009-07-09
§
|
23:56 |
<RoanKattouw> |
Rebooted prototype around 16:30, got stuck around 15:30 |
[production] |
21:43 |
<Rob> |
srv35 (test.wikipedia.org) is not posting, i think its dead jim. |
[production] |
21:35 |
<Rob> |
decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly |
[production] |
20:04 |
<Rob> |
removed decommissioned servers from node groups, getting error on syncing up nagios. |
[production] |
20:03 |
<Rob> |
updated dns for new apache servers |
[production] |
19:54 |
<Rob> |
decommissioned all old apaches in rack pmtpa b2 |
[production] |
16:22 |
<Tim> |
creating mhrwiki (bug 19515) |
[production] |
13:27 |
<domas> |
db13 controller battery failed, s2 needs master switch eventually |
[production] |
2009-07-07
§
|
19:06 |
<Fred> |
adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460) |
[production] |
17:34 |
<Fred> |
found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically... |
[production] |
17:05 |
<Fred> |
ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them). |
[production] |
15:02 |
<robh> |
synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia' |
[production] |
03:37 |
<Fred> |
fixing ganglia. Expect disruption |
[production] |
00:27 |
<tomaszf> |
starting six worker threads for xml snapshots |
[production] |
00:12 |
<Fred> |
srv142 and srv55 will need manual power-cycle. |
[production] |
00:10 |
<Fred> |
Rolling reboot has finally completed. |
[production] |
2009-07-03
§
|
12:51 |
<andrew> |
synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.' |
[production] |
11:45 |
<mark> |
Kicked iris so it would boot |
[production] |
10:11 |
<andrew> |
synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system |
[production] |
10:00 |
<Andrew> |
testing |
[production] |
05:51 |
<Tim> |
restarted squid instances on sq28 |
[production] |
05:47 |
<Tim> |
restarted squid instances on sq2 |
[production] |
05:46 |
<Tim> |
started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage |
[production] |
05:35 |
<Tim> |
restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about. |
[production] |