301-350 of 2080 results (4ms)
2009-04-20 §
15:29 <Rob> replaced fan and drive in srv63, reinstalling [production]
14:36 <Rob> memory replaced in srv203, back online. [production]
14:11 <Rob> shutting down srv203 to swap out bad memory [production]
05:12 <Tim> fixed memcached on srv75, stopped old ES slave on srv102, srv106, srv107, srv159, srv171 [production]
2009-04-18 §
14:05 <Tim> unblocked 80legs, they promised to be nice [production]
13:56 <tstarling> synchronized robots.txt [production]
05:26 <azafred> rebooted db20 after / ran out of space and started causing all kind of issues. [production]
2009-04-17 §
22:49 <brion> regenerated centralnotice output again... this time ok [production]
22:48 <brion> srv93 and srv107 memcached nodes are running but broken. restarting them... [production]
22:43 <brion> restarted srv82 memcache node. attempting to rebuild centralnotices... [production]
22:41 <brion> bad memcached node srv82 [production]
22:05 <mark> Set up 3 new pywikipedia mailing lists, redirected svn commit output to one of them [production]
19:38 <robh> synchronized php-1.5/InitialiseSettings.php 'Bug 18494 Logo for ln.wiki' [production]
17:22 <Rob> removed wikimedia.se from our nameservers as they are using their own. [production]
16:48 <azafred> updated spamassassin rules on lily to include the SARE rules and mirror the settings on McHenry. [production]
10:25 <tstarling> synchronized robots.txt [production]
08:19 <tstarling> synchronized php-1.5/InitialiseSettings.php [production]
07:13 <Tim> temporarily killed apache on overloaded ES masters [production]
07:11 <tstarling> synchronized php-1.5/db.php 'zeroing read load on ES masters' [production]
06:04 <Tim> brief site-wide outage while it rebooted, reason unknown. All good now. Resuming logrotate. [production]
05:55 <Tim> db20 h/w reboot [production]
05:48 <Tim> shutting down daemons on db20 for pre-emptive reboot. Serial console shows "BUG: soft lockup - CPU#4 stuck for 11s! [rsync:27854]" etc. [production]
05:10 <Tim> on db20: killed logrotate -f half done due to alarming kswapd CPU (linked to deadlocked rsync processes). May need a reboot. [production]
05:00 <Tim> fixed logrotate on db20, broken since March 10 due to broken status file, most likely due to non-ASCII filenames generated by demux.py. Patched demux.py. Removed everything.log. [production]
02:14 <river> set up ms6.esams, copying /export/upload from ms1 [production]
00:24 <Tim> blocked lots of uci.edu IPs that were collectively doing 20 req/s of expensive API queries, overloading ES [production]
00:15 <brion> techblog post on Phorm opt-out is linked from slashdot; load on singer seems fairly stable. [production]
2009-04-16 §
23:06 <tfinc> synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php [production]
22:48 <azafred> bounced apache on srv217. All threads were DED - dead [production]
22:16 <tfinc> synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php [production]
22:08 <tfinc> synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php [production]
17:41 <domas> fantastic. I start _looking_ at stuff and it fixes itself. [production]
17:35 <midom> synchronized php-1.5/includes/Revision.php 'live profiling hook' [production]
17:28 <domas> db20 has kswapd deadlock, needs reboot soonish [production]
17:18 <midom> synchronized php-1.5/InitialiseSettings.php 'disabled stats' [production]
17:15 <midom> synchronized php-1.5/InitialiseSettings.php 'enabling udp stats' [production]
16:18 <azafred> bounced apache on srv217 (no pid file so previous restart did not include this one) [production]
15:57 <brion> network borkage between Florida and Amsterdam. Visitors through AMS proxies can't reach sites. [production]
15:55 <azafred> bounced apache on srv[73,86,88,93,108,114,139,141,154,181,194,204,213,99] [production]
15:52 <Tim-away> started mysqld on srv98,srv122,srv124,srv142,srv106,srv107: done with them for now. srv102 still going. [production]
15:30 <mark> Set up ms6 with SP management at ms6.ipmi.esams.wikimedia.org [production]
14:13 <mark> Restoring traffic to Amsterdam cluster [production]
14:06 <mark> Reloading csw1-esams [production]
13:55 <mark> Reloading csw1-esams [production]
13:53 <JeLuF> ms1 NFS issues again. Might be load related [production]
13:49 <Tim> copying fedora ES data from ms3 to ms2 [production]
13:44 <JeLuF> ms1 is reachable, no errors logged, NFS daemons running fine. After some minutes, NFS clients were able to access the server again. Root cause unknown. [production]
13:38 <JeLuF> ms1 issues. On NFS slaves: "ls: cannot access /mnt/upload5/: Input/output error" [production]
13:24 <mark> DNS scenario knams-down for upcoming core switch reboot [production]
08:23 <river> pdns on bayle crashed, bindbackend parser seems rather fragile [production]