551-600 of 10000 results (15ms)
2012-07-02 §
15:04 <mutante> authdns-update to switch jobs.wm redirect to wikimedia-lb to fix SSL cert mismatch (RT-3071) [production]
14:55 <mark> Reboot of cr1-sdtpa did not fix the RE packet loss issue... therefore unlikely to be leap second related [production]
14:41 <mark> Rebooting cr1-sdtpa [production]
14:37 <mark> Shutdown PyBal BGP sessions on cr1-sdtpa [production]
14:34 <mark> Shutdown BGP session to 2828 on cr1-sdtpa [production]
13:36 <hashar> db12 suffering some 1400sec (and growing) replag. mysqldump in progress on that host. [production]
12:35 <mutante> installing upgrades on fenari (linux-firmware linux-libc-dev..) [production]
12:27 <mutante> rebooting gallium one more time to install kernel [production]
12:26 <mutante> upgrading kernel on gallium [production]
12:23 <hashar> synchronized live-1.5/CREDITS [production]
11:31 <mark> Now we have packet loss within pmtpa/sdtpa... reverting change [production]
10:57 <mark> Problems on one of two pmtpa-eqiad waves; raised OSPF metric to 60 to failover traffic to the other link [production]
10:50 <Tim> fixing leap second issue on bastion1 by rebooting it [production]
10:47 <Tim> fixed leap second issue on bastion-restricted [production]
09:57 <Tim> fixing leap second issue on virt1,virt2,virt3,virt4,virt5 [production]
09:53 <Tim> fixing leap second issue on aluminium,gallium,manganese [production]
09:47 <Tim> fixing leap second issue on formey,grosley,hooper,sanger,sockpuppet [production]
09:43 <Tim> on fenari: fixed leap second issue with the mozilla method [production]
09:39 <apergos> rebooting gallium, it's pretty unhappy (maybe related to leap second issue) [production]
08:14 <hashar:> srv190 srv266 srv281 timeouts on sync-file [production]
08:14 <hashar> synchronized wmf-config/InitialiseSettings.php 'Bug 37457 - fix import sources for viwikibooks' [production]
08:11 <hashar> Stopped Jenkins on gallium. It is not doing anything anyway. Asked to reboot box {{rt|3208}} [production]
02:53 <LocalisationUpdate> completed (1.20wmf5) at Mon Jul 2 02:53:51 UTC 2012 [production]
02:28 <LocalisationUpdate> completed (1.20wmf6) at Mon Jul 2 02:28:48 UTC 2012 [production]
01:48 <Tim> kill -CONT on populateRevisionSha1.php processes [production]
00:47 <Tim> on nfs1: trying leap second fix suggested at https://bugzilla.mozilla.org/show_bug.cgi?id=769972#c5 [production]
00:26 <tstarling> synchronized wmf-config/db.php 'reduce db32 read load to zero due to persistent lag' [production]
00:12 <Tim> switched enwiki back to r/w [production]
00:12 <tstarling> synchronized wmf-config/db.php [production]
00:06 <Tim> on hume: stopped all populateRevisionSha1.php processes with kill -STOP [production]
00:03 <reedy> synchronized wmf-config/db.php 's1/enwiki into readonly' [production]
2012-07-01 §
19:12 <reedy> synchronized php-1.20wmf6/extensions/WikimediaMaintenance/ 'Update to master for hashar' [production]
17:55 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php 'more logging' [production]
17:45 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php 'more logging' [production]
17:43 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php 'more logging' [production]
17:32 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php [production]
17:30 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php [production]
16:53 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php [production]
16:48 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php 'logging' [production]
12:54 <notpeter> also going to reboot all pmtpa search nodes. not in prod, but are still freaking out from leap second bug. [production]
05:33 <aaron> synchronized php-1.20wmf5/includes/WikiPage.php 'logging' [production]
04:25 <LocalisationUpdate> completed (1.20wmf5) at Sun Jul 1 04:25:25 UTC 2012 [production]
04:06 <Ryan_Lane> virt1000 is back up, rebooting virt0 [production]
04:02 <Ryan_Lane> rebooting virt1000 [production]
03:16 <LocalisationUpdate> completed (1.20wmf6) at Sun Jul 1 03:16:39 UTC 2012 [production]
01:43 <notpeter> that worked. restarting all remaining search nodes. [production]
01:39 <notpeter> problem with lucene persisting through service restart, but not node restart. restarting en pool nodes. [production]
01:20 <paravoid> restarting opendj (nfs1/nfs2), load spike, possibly related to leap second [production]
00:51 <notpeter> search1004 dead. powercycling. [production]
00:50 <notpeter> based on ganglia evidence, lucene seems to have been affected by leap second bug. restartig each instance, one minute wait in between [production]