| 
      
        2016-02-18
      
      ยง
     | 
  
    
  | 18:04 | 
  <mobrovac> | 
  restbase deploy end of a42976cc82 | 
  [production] | 
            
  | 18:03 | 
  <twentyafterfour> | 
   applied a hotfix from https://secure.phabricator.com/D15306 on iridium to test a fix for https://phabricator.wikimedia.org/T127290 | 
  [production] | 
            
  | 18:00 | 
  <godog> | 
  reenable puppet on restbase1008 | 
  [production] | 
            
  | 17:49 | 
  <mobrovac> | 
  restbase deploy start of a42976cc82 | 
  [production] | 
            
  | 17:47 | 
  <elukey> | 
  manual failover of hadoop master node (analytics1001) to secondary (analytics1002) for maintenance (plus service restarts) | 
  [production] | 
            
  | 17:41 | 
  <urandom> | 
  upgrading Cassandra to 2.1.13 on cerium.eqiad.wmnet (restbase staging) T126629 | 
  [production] | 
            
  | 17:28 | 
  <mobrovac> | 
  restbase deploying a42976cc82 to restbase1002 | 
  [production] | 
            
  | 17:27 | 
  <urandom> | 
  Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence?): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child : T126629 | 
  [production] | 
            
  | 17:26 | 
  <urandom> | 
  Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child | 
  [production] | 
            
  | 17:21 | 
  <urandom> | 
  upgrading Cassandra to 2.1.13 on xenon.eqiad.wmnet (restbase staging) T126629 | 
  [production] | 
            
  | 17:20 | 
  <elukey> | 
  disabled puppet on analytics1027 to avoid any Camus job to run | 
  [production] | 
            
  | 17:04 | 
  <dcausse> | 
  updating completion suggester indices in eqiad | 
  [production] | 
            
  | 16:54 | 
  <elukey> | 
  restarting hadoop services on analytics105* nodes for security updates | 
  [production] | 
            
  | 16:49 | 
  <gehel> | 
  removing cirrus maintenance crons from mw1152 (T127322) | 
  [production] | 
            
  | 15:52 | 
  <dcausse> | 
  creating adywiki indices in codfw | 
  [production] | 
            
  | 15:44 | 
  <elukey> | 
  restarting hadoop services on analytics104* nodes for security updates | 
  [production] | 
            
  | 15:37 | 
  <elukey> | 
  restarting hadoop services on analytics102* nodes for security update | 
  [production] | 
            
  | 15:33 | 
  <moritzm> | 
  restarting apache on silver/wikitech | 
  [production] | 
            
  | 15:10 | 
  <elukey> | 
  restarting hadoop services on analytics103* hosts for security upgrades | 
  [production] | 
            
  | 14:06 | 
  <bblack> | 
  restarting apache on gallium (integration) | 
  [production] | 
            
  | 13:13 | 
  <mark> | 
  decreased raid md2 sync_speed_max to 6000 on restbase1008 | 
  [production] | 
            
  | 12:55 | 
  <elukey> | 
  rebooted kafka1022.eqiad.wmnet for kernel upgrade | 
  [production] | 
            
  | 12:51 | 
  <godog> | 
  decrease raid min_speed to 8000 on restbase1008 | 
  [production] | 
            
  | 12:50 | 
  <hoo@tin> | 
  Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for Wikidata (duration: 01m 54s) | 
  [production] | 
            
  | 12:41 | 
  <elukey> | 
  rebooted kafka1020 for kernel upgrade. | 
  [production] | 
            
  | 12:40 | 
  <godog> | 
  decrease raid min_speed to 10000 on restbase1008 | 
  [production] | 
            
  | 12:24 | 
  <godog> | 
  increase stripe_cache_size to 32470 on restbase1008 | 
  [production] | 
            
  | 12:21 | 
  <godog> | 
  expand raid0 on restbase1008 to sdd and sde | 
  [production] | 
            
  | 11:36 | 
  <paravoid> | 
  upgrading mr1-ulsfo to its pre-recovery version and rebooting (T127295) | 
  [production] | 
            
  | 11:34 | 
  <hashar> | 
  Hard restarting Jenkins T127294 | 
  [production] | 
            
  | 11:32 | 
  <jynus> | 
  logical import of db1021 starting for data consistency check and defragmenting purposes | 
  [production] | 
            
  | 11:29 | 
  <paravoid> | 
  mr1-ulsfo: "request system snapshot media internal slice alternate" + reboot (T127295) | 
  [production] | 
            
  | 11:27 | 
  <hashar> | 
  Jenkins web UI busy with 'jenkins.model.RunIdMigrator doMigrate' while it migrate build records. I did a bunch of cleanup yesterday.   Jenkins runs jobs in the background just fine though.  T127294 | 
  [production] | 
            
  | 11:12 | 
  <hashar> | 
  Jenkins: reloading configuration from disk. Some metadata are corrupted T127294 | 
  [production] | 
            
  | 10:48 | 
  <elukey> | 
  rebooted kafka1018 for maintenance | 
  [production] | 
            
  | 10:17 | 
  <elukey> | 
  rebooted kafka1014 for maintenance | 
  [production] | 
            
  | 10:10 | 
  <moritzm> | 
  restarting hhvm on mw1* to put glibc update into effect | 
  [production] | 
            
  | 09:49 | 
  <godog> | 
  remove old restbase metrics under restbase.* from graphite1001 and graphite2001 | 
  [production] | 
            
  | 03:13 | 
  <twentyafterfour> | 
  running puppet one last time on iridium. Phabricator upgrade successful with just a few minor issues now resolved. | 
  [production] | 
            
  | 03:01 | 
  <l10nupdate@tin> | 
  ResourceLoader cache refresh completed at Thu Feb 18 03:01:01 UTC 2016 (duration 9m 24s) | 
  [production] | 
            
  | 02:51 | 
  <mwdeploy@tin> | 
  sync-l10n completed (1.27.0-wmf.14) (duration: 11m 20s) | 
  [production] | 
            
  | 02:29 | 
  <mwdeploy@tin> | 
  sync-l10n completed (1.27.0-wmf.13) (duration: 13m 55s) | 
  [production] | 
            
  | 02:18 | 
  <twentyafterfour> | 
  phabricator is back online, sprint extension is broken, I'm investigating | 
  [production] | 
            
  | 01:57 | 
  <mutante> | 
  powercycled frozen mw1147 | 
  [production] | 
            
  | 01:51 | 
  <twentyafterfour> | 
  phab pre-upgrade: http://pastebin.com/RTmXfDhp | 
  [production] | 
            
  | 01:49 | 
  <twentyafterfour> | 
  about to bring down phabricator to do the upgrade | 
  [production] | 
            
  | 01:49 | 
  <twentyafterfour> | 
  ran puppet on iridium for testing | 
  [production] | 
            
  | 01:08 | 
  <twentyafterfour> | 
  stopped phd and started dumping phabricator's database to /srv/dumps/20160218.phabricator.sql.gz (just in case I need to roll back the update) | 
  [production] | 
            
  | 00:34 | 
  <catrope@tin> | 
  Synchronized php-1.27.0-wmf.13/extensions/Flow: Trying again (duration: 01m 50s) | 
  [production] | 
            
  | 00:28 | 
  <RoanKattouw> | 
  00:28:25 64 apaches had sync errors  , /usr/bin/sync-common missing | 
  [production] |