9251-9300 of 10000 results (31ms)
2009-07-10 §
10:48 <Tim> mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned [production]
10:37 <Tim> installed memcached on srv120, srv121, srv122, srv123 [production]
10:32 <Tim> found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. [production]
2009-07-09 §
23:56 <RoanKattouw> Rebooted prototype around 16:30, got stuck around 15:30 [production]
21:43 <Rob> srv35 (test.wikipedia.org) is not posting, i think its dead jim. [production]
21:35 <Rob> decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly [production]
20:04 <Rob> removed decommissioned servers from node groups, getting error on syncing up nagios. [production]
20:03 <Rob> updated dns for new apache servers [production]
19:54 <Rob> decommissioned all old apaches in rack pmtpa b2 [production]
16:22 <Tim> creating mhrwiki (bug 19515) [production]
13:27 <domas> db13 controller battery failed, s2 needs master switch eventually [production]
2009-07-08 §
13:31 <midom> synchronized php-1.5/InitialiseSettings.php 'disabling usability initiative on all wikis, except test and usability. someone who enabled this and left at this state should be shot' [production]
2009-07-07 §
19:06 <Fred> adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460) [production]
17:34 <Fred> found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically... [production]
17:05 <Fred> ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them). [production]
15:02 <robh> synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia' [production]
03:37 <Fred> fixing ganglia. Expect disruption [production]
00:27 <tomaszf> starting six worker threads for xml snapshots [production]
00:12 <Fred> srv142 and srv55 will need manual power-cycle. [production]
00:10 <Fred> Rolling reboot has finally completed. [production]
2009-07-06 §
23:57 <Fred> restarted ganglia since it is acting up... [production]
23:54 <tomaszf> restarting all xml snapshots due to kernel upgrades [production]
18:49 <Rob> upgraded spam detection plugins on blog and techblog [production]
18:47 <Fred> starting rolling reboot of servers in Apaches cluster. [production]
17:53 <tomaszf> cleaning out space on storage2. lowering retention for xml snapshots to 10 [production]
17:53 <Fred> upgrading kernel on cluster. This will take a while! [production]
17:46 <Fred> rebooting srv220 to test kernel update. [production]
2009-07-03 §
12:51 <andrew> synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.' [production]
11:45 <mark> Kicked iris so it would boot [production]
10:11 <andrew> synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system [production]
10:00 <Andrew> testing [production]
05:51 <Tim> restarted squid instances on sq28 [production]
05:47 <Tim> restarted squid instances on sq2 [production]
05:46 <Tim> started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage [production]
05:35 <Tim> restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about. [production]
2009-07-02 §
21:38 <Rob> sq24 wont accept ssh, depooling. [production]
21:34 <Rob> rebooting sq21 [production]
21:26 <Rob> ran changes to push dns back to normal scenario [production]
19:52 <mark> Power outage at esams, moving traffic [production]
19:44 <Andrew> Knams down, Rob is looking into it [production]
19:41 <Andrew> Reports of problems from Europe [production]
19:25 <Andrew> running sync-common-all to deploy mobileRedirect.php to fix hcatlin's mobile redirect/cookie bug [production]
19:22 <andrew> synchronized live-1.5/mobileRedirect.php [production]
17:15 <mark> Rebooted srv159 [production]
16:13 <Fred> shutting 217 back down as it is not supposed to be up due to faulty timer causing issues. [production]
16:12 <Fred> rebooted srv217. Was unpingable. [production]
14:09 <Andrew> Started sending updates of spam.log to Project Honeypot folks every 5 minutes, in my crontab on hume. [production]
11:20 <andrew> synchronized php-1.5/skins/common/shared.css 'Live-merging r52669, r52684 at rainman's request, search fixes.' [production]
11:18 <andrew> synchronized php-1.5/includes/specials/SpecialSearch.php 'Live-merging r52669, r52684 at rainman's request, search fixes.' [production]
00:03 <brion> synchronized php-1.5/CommonSettings.php [production]