| 
      
        2014-07-15
      
      §
     | 
  
    
  | 16:10 | 
  <_joe_> | 
  mw1100 and onwards updated | 
  [production] | 
            
  | 16:00 | 
  <_joe_> | 
  mw1060-mw1099 updated | 
  [production] | 
            
  | 15:58 | 
  <manybubbles> | 
  restarting Elasticsearch on elastic1017 - its thrashing the disk again.  I'm still not 100% sure why | 
  [production] | 
            
  | 15:57 | 
  <_joe_> | 
  mw1020-mw1059 updated | 
  [production] | 
            
  | 15:53 | 
  <_joe_> | 
  mw101[0-9] updated | 
  [production] | 
            
  | 15:47 | 
  <_joe_> | 
  starting rolling update of all appservers to apache2 2.2.22-1ubuntu1.6, half of them are on 2.2.22-1ubuntu1.5 now | 
  [production] | 
            
  | 15:42 | 
  <manybubbles> | 
  setting the filter cache on one node in the cluster set it on all.  yay, I guess.  Anyway, I'm going to let it soak for a while. | 
  [production] | 
            
  | 15:32 | 
  <manybubbles> | 
  setting filter cache size to 20% on elastic1001 to see if it takes/helps us | 
  [production] | 
            
  | 15:19 | 
  <anomie> | 
  Synchronized wmf-config/: SWAT: Remove dead ULS variable [[gerrit:145861]] (duration: 00m 10s) | 
  [production] | 
            
  | 15:18 | 
  <anomie> | 
  anomie actually committed a live hack someone left on tin (removing db1035) | 
  [production] | 
            
  | 15:16 | 
  <anomie> | 
  updated /a/common to {{Gerrit|I7ca6a16d5}}: Switch jawiki back to lsearchd | 
  [production] | 
            
  | 13:42 | 
  <manybubbles> | 
  Synchronized wmf-config/InitialiseSettings.php: jawiki back to lsearchd (duration: 00m 05s) | 
  [production] | 
            
  | 13:38 | 
  <manybubbles> | 
  elastic1017 had a load average of 60 - was thashing in io.  bounced Elasticsearch.  lets see if it recovers on its own | 
  [production] | 
            
  | 09:09 | 
  <_joe_> | 
  restarting mailman on sodium, again, for testing | 
  [production] | 
            
  | 08:50 | 
  <godog> | 
  restart mailman on sodium after inodes freed | 
  [production] | 
            
  | 07:27 | 
  <_joe_> | 
  restarted mailman on sodium | 
  [production] | 
            
  | 07:22 | 
  <_joe_> | 
  stopping mailman on sodium for repairing | 
  [production] | 
            
  | 06:54 | 
  <_joe_> | 
  killed jenkins stale process on gallium, stuck in a futex while shutting down | 
  [production] | 
            
  | 04:48 | 
  <springle> | 
  db1035 crash cycle. down for memtest and stuff | 
  [production] | 
            
  | 03:34 | 
  <LocalisationUpdate> | 
  ResourceLoader cache refresh completed at Tue Jul 15 03:33:38 UTC 2014 (duration 33m 37s) | 
  [production] | 
            
  | 03:01 | 
  <LocalisationUpdate> | 
  completed (1.24wmf13) at 2014-07-15 03:00:03+00:00 | 
  [production] | 
            
  | 02:34 | 
  <springle> | 
  Synchronized wmf-config/db-eqiad.php: depool db1035, crashed (duration: 00m 13s) | 
  [production] | 
            
  | 02:30 | 
  <LocalisationUpdate> | 
  completed (1.24wmf12) at 2014-07-15 02:29:02+00:00 | 
  [production] | 
            
  | 02:27 | 
  <springle> | 
  powercycle db1035 unresponsive | 
  [production] | 
            
  
    | 
      
        2014-07-14
      
      §
     | 
  
    
  | 23:32 | 
  <mwalker> | 
  Started scap: Updating for SWAT {{gerrit|146304}}, {{gerrit|146306}}, {{gerrit|146149}}, {{gerrit|146165}}, {{gerrit|146166}}, {{gerrit|146282}}, and {{gerrit|146281}}. Also finishing awight's deploy of FundraisingTranslateWorkflow. | 
  [production] | 
            
  | 20:22 | 
  <cscott> | 
  updated Parsoid to version d51e64097bb1b18e356584d4f3ddcfd90a6071ba | 
  [production] | 
            
  | 19:57 | 
  <ori> | 
  postponing jobrunner deployment to tomorrow; ran over time | 
  [production] | 
            
  | 19:45 | 
  <_joe_> | 
  doing the same on mw1064, segfaulted for the same reason | 
  [production] | 
            
  | 19:44 | 
  <_joe_> | 
  killed a lone apache2 child on mw1152, stuck in a futex, after a segfault of another apache process. Restarted apache, now working correctly | 
  [production] | 
            
  | 19:04 | 
  <godog> | 
  re-enabling mailman on sodium, missing list config restored | 
  [production] | 
            
  | 18:49 | 
  <awight> | 
  Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (t | 
  [production] | 
            
  | 18:45 | 
  <awight> | 
  Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (wmf13) (duration: 00m 05s) | 
  [production] | 
            
  | 18:44 | 
  <awight> | 
  Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (duration: 00m 05s) | 
  [production] | 
            
  | 18:15 | 
  <awight> | 
  Synchronized wmf-config: Revert: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 04s) | 
  [production] | 
            
  | 18:03 | 
  <awight> | 
  Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 05s) | 
  [production] | 
            
  | 18:03 | 
  <awight> | 
  updated /a/common to {{Gerrit|Ie7599fb6e}}: jawiki gets Cirrus as primary search | 
  [production] | 
            
  | 17:43 | 
  <Krinkle> | 
  npm-cache for integration slaves got corrupted again. Depooling/Repooling integration-slave100{1,2,3} onoe by one to clear cache and let it warm up again. | 
  [production] | 
            
  | 17:35 | 
  <Krinkle> | 
  Jenkins slaves in labs are unable to reach zuul.eqiad.wmnet | 
  [production] | 
            
  | 17:10 | 
  <andrewbogott> | 
  purging old local-* service group entries from labs ldap (via purgeOldServiceGroups.php) | 
  [production] | 
            
  | 17:05 | 
  <godog> | 
  started mailman on sodium post-reboot | 
  [production] | 
            
  | 17:04 | 
  <demon> | 
  Synchronized wmf-config/InitialiseSettings.php: nlwiki getting cirrus as primary (duration: 00m 04s) | 
  [production] | 
            
  | 15:11 | 
  <manybubbles> | 
  Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 04s) | 
  [production] | 
            
  | 15:04 | 
  <manybubbles> | 
  Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 04s) | 
  [production] | 
            
  | 15:02 | 
  <manybubbles> | 
  Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 05s) | 
  [production] | 
            
  | 14:39 | 
  <_joe_> | 
  rebooted nescio, stuck and with console showing just a truncated log (timestamp only) | 
  [production] | 
            
  | 14:33 | 
  <mutante> | 
  powercycling sodium | 
  [production] | 
            
  | 14:02 | 
  <mutante> | 
  stat1002 - "Could not find declared class ::oozie" | 
  [production] | 
            
  | 09:36 | 
  <legoktm> | 
  ran initSiteStats.php on all wikivoyages for bug 64370 | 
  [production] | 
            
  | 09:02 | 
  <godog> | 
  repool ms-fe1001 after upgrade, basic testing successful | 
  [production] | 
            
  | 08:34 | 
  <godog> | 
  depool ms-fe1001 for swift icehouse upgrade | 
  [production] |