2009-07-10
§
|
19:58 |
<azafred> |
synchronized php-1.5/CommonSettings.php 'removed border=0 from wgCopyrightIcon' |
[production] |
18:58 |
<Fred> |
synched nagios config to reflect cleanup. |
[production] |
18:52 |
<Fred> |
cleaned up the node_files for dsh and removed all decommissioned hosts. |
[production] |
18:36 |
<mark> |
Added DNS entries for srv251-500 |
[production] |
18:18 |
<fvassard> |
synchronized php-1.5/mc-pmtpa.php 'Added a couple spare memcache hosts.' |
[production] |
18:16 |
<RobH_DC> |
moved test to srv66 instead. |
[production] |
18:08 |
<RobH_DC> |
turning srv210 into test.wikipedia.org |
[production] |
17:57 |
<Andrew> |
Reactivating UsabilityInitiative globally, too. |
[production] |
17:55 |
<Andrew> |
Scapping, back-out diff is in /home/andrew/usability-diff |
[production] |
17:43 |
<Andrew> |
Apply r52926, r52930, and update Resources and EditToolbar/images |
[production] |
16:44 |
<Fred> |
reinstalled and configured gmond on storage1. |
[production] |
15:08 |
<Rob> |
upgraded blog and techblog to wordpress 2.8.1 |
[production] |
13:58 |
<midom> |
synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\\!' |
[production] |
12:40 |
<Tim> |
prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch |
[production] |
11:08 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'more' |
[production] |
10:58 |
<Tim> |
installed memcached on srv200-srv209 |
[production] |
10:51 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more' |
[production] |
10:48 |
<Tim> |
mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned |
[production] |
10:37 |
<Tim> |
installed memcached on srv120, srv121, srv122, srv123 |
[production] |
10:32 |
<Tim> |
found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. |
[production] |
2009-07-09
§
|
23:56 |
<RoanKattouw> |
Rebooted prototype around 16:30, got stuck around 15:30 |
[production] |
21:43 |
<Rob> |
srv35 (test.wikipedia.org) is not posting, i think its dead jim. |
[production] |
21:35 |
<Rob> |
decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly |
[production] |
20:04 |
<Rob> |
removed decommissioned servers from node groups, getting error on syncing up nagios. |
[production] |
20:03 |
<Rob> |
updated dns for new apache servers |
[production] |
19:54 |
<Rob> |
decommissioned all old apaches in rack pmtpa b2 |
[production] |
16:22 |
<Tim> |
creating mhrwiki (bug 19515) |
[production] |
13:27 |
<domas> |
db13 controller battery failed, s2 needs master switch eventually |
[production] |
2009-07-07
§
|
19:06 |
<Fred> |
adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460) |
[production] |
17:34 |
<Fred> |
found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically... |
[production] |
17:05 |
<Fred> |
ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them). |
[production] |
15:02 |
<robh> |
synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia' |
[production] |
03:37 |
<Fred> |
fixing ganglia. Expect disruption |
[production] |
00:27 |
<tomaszf> |
starting six worker threads for xml snapshots |
[production] |
00:12 |
<Fred> |
srv142 and srv55 will need manual power-cycle. |
[production] |
00:10 |
<Fred> |
Rolling reboot has finally completed. |
[production] |