5901-5950 of 8624 results (24ms)
2009-07-10 §
17:55 <Andrew> Scapping, back-out diff is in /home/andrew/usability-diff [production]
17:43 <Andrew> Apply r52926, r52930, and update Resources and EditToolbar/images [production]
16:44 <Fred> reinstalled and configured gmond on storage1. [production]
15:08 <Rob> upgraded blog and techblog to wordpress 2.8.1 [production]
13:58 <midom> synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\\!' [production]
12:40 <Tim> prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch [production]
11:08 <tstarling> synchronized php-1.5/mc-pmtpa.php 'more' [production]
10:58 <Tim> installed memcached on srv200-srv209 [production]
10:51 <tstarling> synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more' [production]
10:48 <Tim> mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned [production]
10:37 <Tim> installed memcached on srv120, srv121, srv122, srv123 [production]
10:32 <Tim> found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. [production]
2009-07-09 §
23:56 <RoanKattouw> Rebooted prototype around 16:30, got stuck around 15:30 [production]
21:43 <Rob> srv35 (test.wikipedia.org) is not posting, i think its dead jim. [production]
21:35 <Rob> decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly [production]
20:04 <Rob> removed decommissioned servers from node groups, getting error on syncing up nagios. [production]
20:03 <Rob> updated dns for new apache servers [production]
19:54 <Rob> decommissioned all old apaches in rack pmtpa b2 [production]
16:22 <Tim> creating mhrwiki (bug 19515) [production]
13:27 <domas> db13 controller battery failed, s2 needs master switch eventually [production]
2009-07-08 §
13:31 <midom> synchronized php-1.5/InitialiseSettings.php 'disabling usability initiative on all wikis, except test and usability. someone who enabled this and left at this state should be shot' [production]
2009-07-07 §
19:06 <Fred> adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460) [production]
17:34 <Fred> found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically... [production]
17:05 <Fred> ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them). [production]
15:02 <robh> synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia' [production]
03:37 <Fred> fixing ganglia. Expect disruption [production]
00:27 <tomaszf> starting six worker threads for xml snapshots [production]
00:12 <Fred> srv142 and srv55 will need manual power-cycle. [production]
00:10 <Fred> Rolling reboot has finally completed. [production]
2009-07-06 §
23:57 <Fred> restarted ganglia since it is acting up... [production]
23:54 <tomaszf> restarting all xml snapshots due to kernel upgrades [production]
18:49 <Rob> upgraded spam detection plugins on blog and techblog [production]
18:47 <Fred> starting rolling reboot of servers in Apaches cluster. [production]
17:53 <tomaszf> cleaning out space on storage2. lowering retention for xml snapshots to 10 [production]
17:53 <Fred> upgrading kernel on cluster. This will take a while! [production]
17:46 <Fred> rebooting srv220 to test kernel update. [production]
2009-07-03 §
12:51 <andrew> synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.' [production]
11:45 <mark> Kicked iris so it would boot [production]
10:11 <andrew> synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system [production]
10:00 <Andrew> testing [production]
05:51 <Tim> restarted squid instances on sq28 [production]
05:47 <Tim> restarted squid instances on sq2 [production]
05:46 <Tim> started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage [production]
05:35 <Tim> restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about. [production]
2009-07-02 §
21:38 <Rob> sq24 wont accept ssh, depooling. [production]
21:34 <Rob> rebooting sq21 [production]
21:26 <Rob> ran changes to push dns back to normal scenario [production]
19:52 <mark> Power outage at esams, moving traffic [production]
19:44 <Andrew> Knams down, Rob is looking into it [production]
19:41 <Andrew> Reports of problems from Europe [production]