| 
      
        2013-11-29
      
      §
     | 
  
    
  | 20:18 | 
  <paravoid> | 
  rebooting cp1065 with new kernel | 
  [production] | 
            
  | 19:26 | 
  <paravoid> | 
  "swapoff -a" on all cache_text to deal with strange kernel issue with kswapd dropping the whole page cache on memory pressure | 
  [production] | 
            
  | 14:49 | 
  <paravoid> | 
  restarted gmond on ms-fe1001/2, both were stuck 6h ago and we lost all swift eqiad's metrics for that period | 
  [production] | 
            
  | 11:12 | 
  <Reedy> | 
  Created EducationProgram tables on arwiki | 
  [production] | 
            
  | 05:51 | 
  <Tim> | 
  on cp1052 and cp1053: tweaked /proc/sys/net/core/rmem_default to see if that fixes the observed massive gmond packet loss | 
  [production] | 
            
  | 02:08 | 
  <LocalisationUpdate> | 
  ResourceLoader cache refresh completed at Fri Nov 29 02:07:55 UTC 2013 | 
  [production] | 
            
  | 02:02 | 
  <LocalisationUpdate> | 
  completed (1.23wmf5) at Fri Nov 29 02:02:25 UTC 2013 | 
  [production] | 
            
  | 02:01 | 
  <LocalisationUpdate> | 
  completed (1.23wmf4) at Fri Nov 29 02:01:44 UTC 2013 | 
  [production] | 
            
  | 01:22 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'repool pc1001 after upgrade, max_connections lowered during warm up' | 
  [production] | 
            
  | 00:25 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'depool pc1001 for package upgrade' | 
  [production] | 
            
  
    | 
      
        2013-11-28
      
      §
     | 
  
    
  | 10:49 | 
  <apergos> | 
  turned off logging for parsoid ( https://gerrit.wikimedia.org/r/#/c/98082/ ), old logs remain in place for folks to examine | 
  [production] | 
            
  | 10:06 | 
  <apergos> | 
  stack traces filling up parsoid nohup.out logs (sveral gigs in only a few minutes once the parsoid gets into that state), sample on wtp1010 in /var/lib/parsoid/nohup.out.errors | 
  [production] | 
            
  | 08:34 | 
  <apergos> | 
  and wtp1023 | 
  [production] | 
            
  | 08:29 | 
  <apergos> | 
  /var/lib/parsoid/nohup.out on wtp 1005,11,12 was 6gb or more, causing / on these boxes to fill; moved it, restarted parsoid, removed it | 
  [production] | 
            
  | 07:16 | 
  <apergos> | 
  powercycled sq80  | 
  [production] | 
            
  | 05:41 | 
  <ori> | 
  synchronized wmf-config/CommonSettings.php  'Icdaa4c1b5: Configure parser cache databases in db-$realm file (3/3)' | 
  [production] | 
            
  | 05:41 | 
  <ori> | 
  synchronized wmf-config/db-pmtpa.php  'Icdaa4c1b5: Configure parser cache databases in db-$realm file (2/3)' | 
  [production] | 
            
  | 05:40 | 
  <ori> | 
  synchronized wmf-config/db-eqiad.php  'Icdaa4c1b5: Configure parser cache databases in db-$realm file (1/3)' | 
  [production] | 
            
  | 05:37 | 
  <ori> | 
  updated /a/common to {{Gerrit|Icdaa4c1b5}}: Configure parser cache databases in db-$realm file | 
  [production] | 
            
  | 03:37 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'repool slaves after package upgrade, (lvm snapshot boxes only, LB=0)' | 
  [production] | 
            
  | 03:16 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'depool slaves for package upgrade' | 
  [production] | 
            
  | 02:43 | 
  <LocalisationUpdate> | 
  ResourceLoader cache refresh completed at Thu Nov 28 02:42:58 UTC 2013 | 
  [production] | 
            
  | 02:29 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'slaves to full steam after package upgrade' | 
  [production] | 
            
  | 02:15 | 
  <LocalisationUpdate> | 
  completed (1.23wmf5) at Thu Nov 28 02:15:36 UTC 2013 | 
  [production] | 
            
  | 02:09 | 
  <LocalisationUpdate> | 
  completed (1.23wmf4) at Thu Nov 28 02:09:38 UTC 2013 | 
  [production] | 
            
  | 01:17 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'warm up slaves after package upgrade' | 
  [production] | 
            
  | 01:02 | 
  <ori-l> | 
  started rsync of graphite data (~400gb) from professor.pmtpa to tungsten.eqiad | 
  [production] | 
            
  | 00:40 | 
  <springle> | 
  synchronized wmf-config/db-eqiad.php  'depool slaves for package upgrade' | 
  [production] | 
            
  
    | 
      
        2013-11-27
      
      §
     | 
  
    
  | 19:50 | 
  <demon> | 
  synchronized wmf-config/InitialiseSettings.php  'Fixes for Flow config, no-op in prod' | 
  [production] | 
            
  | 19:49 | 
  <demon> | 
  synchronized wmf-config/CommonSettings.php  'Fixes for Flow config, no-op in prod' | 
  [production] | 
            
  | 18:12 | 
  <paravoid> | 
  kill -9 gdb on cp3012, attached to varnish frontend | 
  [production] | 
            
  | 11:28 | 
  <ori-l> | 
  faidon switched gdash.wm.o from professor.pmtpa -> tungsten.eqiad behind misc-varnish & rebooted ssl1 in tampa | 
  [production] | 
            
  | 11:11 | 
  <apergos> | 
  ssl1 rebooted itself about 15 mins ago, no idea why  | 
  [production] | 
            
  | 10:20 | 
  <ariel> | 
  synchronized wmf-config/db-eqiad.php  'db1019 (s3) back to full weight in the pool' | 
  [production] | 
            
  | 10:19 | 
  <ariel> | 
  updated /a/common to {{Gerrit|If5ebd6194}}: db1019 (s3) back to full weight in pool | 
  [production] | 
            
  | 10:08 | 
  <apergos> | 
  shot some old puppet processes hogging memory on db9 (from march and earlier) | 
  [production] | 
            
  | 09:49 | 
  <apergos> | 
  there was no mount /srv/pagecounts on labstore4, so rsync to /exp/pagecounts wrote to and filled /; did the mkdir and now things seem ok | 
  [production] | 
            
  | 08:00 | 
  <ariel> | 
  synchronized wmf-config/db-eqiad.php  'warm up db1019 (s3) aftr lvm resize' | 
  [production] | 
            
  | 07:59 | 
  <ariel> | 
  updated /a/common to {{Gerrit|I50354e622}}: warm up db1019 (s3) after lvm resize | 
  [production] | 
            
  | 07:38 | 
  <apergos> | 
  rebooting db1019 after kernel upgrade, fix for broken xfs_growfs | 
  [production] | 
            
  | 07:02 | 
  <ariel> | 
  synchronized wmf-config/db-eqiad.php  'depool db1019 (s3) temporarily for lvm resize' | 
  [production] | 
            
  | 07:01 | 
  <ariel> | 
  updated /a/common to {{Gerrit|I4372bb602}}: depool db1019 (s3) temporarily for lvm resize | 
  [production] | 
            
  | 02:40 | 
  <LocalisationUpdate> | 
  ResourceLoader cache refresh completed at Wed Nov 27 02:40:52 UTC 2013 | 
  [production] | 
            
  | 02:15 | 
  <LocalisationUpdate> | 
  completed (1.23wmf5) at Wed Nov 27 02:15:19 UTC 2013 | 
  [production] | 
            
  | 02:08 | 
  <LocalisationUpdate> | 
  completed (1.23wmf4) at Wed Nov 27 02:08:28 UTC 2013 | 
  [production] | 
            
  | 00:32 | 
  <springle> | 
  stopping replication on sanitarium db1054:3308 and labsdb1002:3308 while restoring dewiki to labs | 
  [production] |