2301-2350 of 5062 results (7ms)
2009-07-13 §
00:59 <brion> stopping apache on image scaler boxes, see what that does [production]
00:49 <brion> attempting to replicate domas's earlier temp success dropping oldest snapshot (last was 4/13): zfs destroy export/upload@weekly-2009-04-20_03:30:00 [production]
00:45 <brion> restarting nfs server [production]
00:44 <brion> stopping nfs server, restarting web server [production]
00:40 <brion> restarting nfs server on ms1 [production]
00:36 <brion> doesn't seem so far to have changed the NFS access delays on image scalers. [production]
00:31 <brion> shutting down webserver7 on ms1 [production]
00:23 <brion> investigating site problem reports. image server stack seems overloaded, so intermittent timeouts on nfs to apaches or http/squid to outside [production]
2009-07-12 §
20:30 <domas> dropped few snapshots on ms1, observed sharp %sys decrease and much better nfs properties immediately [production]
20:05 <domas> we seem to be hitting issue similar to http://www.opensolaris.org/jive/thread.jspa?messageID=64379 on ms1 [production]
18:55 <domas> zil_disable=1 on ms1 [production]
18:34 <mark> Upgraded pybal on lvs3 [production]
18:16 <mark> Hacked in configurable timeout support for the ProxyFetch monitor of PyBal, set the renderers timeout at 60s [production]
17:58 <domas> scaler stampedes caused scalers to be depooled by pybal, thus directing stampede to other server in round-robin fashion, all blocking and consuming ms1 SJSWS slots. of course, high I/O load contributed to this. [production]
17:55 <domas> investigating LVS-based rolling scaler overload issue, Mark and Tim heading the effort now ;-) [production]
17:54 <domas> bumped up ms1 SJSWS thread count [production]
2009-07-11 §
15:45 <mark> Rebooting sq1 [production]
15:31 <Tim> rebooting ms1 [production]
14:54 <Tim> disabled CentralNotice temporarily [production]
14:54 <tstarling> synchronized php-1.5/InitialiseSettings.php 'disabling CentralNotice' [production]
14:53 <tstarling> synchronized php-1.5/InitialiseSettings.php 'disabling CentralAuth' [production]
14:36 <Tim> restarted webserver7 on ms1 [production]
14:22 <Tim> some kind of overload, seems to be image related [production]
10:09 <midom> synchronized php-1.5/db.php 'db8 doing commons read load, full write though' [production]
09:22 <domas> restarted job queue with externallinks purging code, <3 [production]
09:22 <domas> installed nrpe on db2 :) [production]
09:22 <midom> synchronized php-1.5/db.php 'giving db24 just negligible load for now' [production]
08:38 <midom> synchronized php-1.5/includes/parser/ParserOutput.php 'livemerging r53103:53105' [production]
08:37 <midom> synchronized php-1.5/includes/DefaultSettings.php [production]
2009-07-10 §
21:21 <Fred> added ganglia to db20 [production]
19:58 <azafred> synchronized php-1.5/CommonSettings.php 'removed border=0 from wgCopyrightIcon' [production]
18:58 <Fred> synched nagios config to reflect cleanup. [production]
18:52 <Fred> cleaned up the node_files for dsh and removed all decommissioned hosts. [production]
18:36 <mark> Added DNS entries for srv251-500 [production]
18:18 <fvassard> synchronized php-1.5/mc-pmtpa.php 'Added a couple spare memcache hosts.' [production]
18:16 <RobH_DC> moved test to srv66 instead. [production]
18:08 <RobH_DC> turning srv210 into test.wikipedia.org [production]
17:57 <Andrew> Reactivating UsabilityInitiative globally, too. [production]
17:55 <Andrew> Scapping, back-out diff is in /home/andrew/usability-diff [production]
17:43 <Andrew> Apply r52926, r52930, and update Resources and EditToolbar/images [production]
16:44 <Fred> reinstalled and configured gmond on storage1. [production]
15:08 <Rob> upgraded blog and techblog to wordpress 2.8.1 [production]
13:58 <midom> synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\\!' [production]
12:40 <Tim> prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch [production]
11:08 <tstarling> synchronized php-1.5/mc-pmtpa.php 'more' [production]
10:58 <Tim> installed memcached on srv200-srv209 [production]
10:51 <tstarling> synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more' [production]
10:48 <Tim> mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned [production]
10:37 <Tim> installed memcached on srv120, srv121, srv122, srv123 [production]
10:32 <Tim> found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it. [production]