| 
      
        2016-04-30
      
      §
     | 
  
    
  | 13:41 | 
  <elukey> | 
  disabled puppet on analytics1047 and scheduled downtime for the host, IO errors in the dmesg for /dev/sdd. Stopped also Hadoop daemons to remove it from the cluster temporarily (not sure how to do it properly, will write docs). | 
  [production] | 
            
  | 10:45 | 
  <volans> | 
  Reset slave on sanitarium:3311 due to corrupted relay log after skipping query for duplicate key T132416 | 
  [production] | 
            
  | 10:19 | 
  <volans> | 
  restarted slave on dbstore1001 skipping missing database T132837 | 
  [production] | 
            
  | 08:28 | 
  <gehel> | 
  restarting elasticsearch server elastic1031.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 07:15 | 
  <gehel> | 
  restarting elasticsearch server elastic1030.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 06:32 | 
  <gehel> | 
  restarting elasticsearch server elastic1029.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 06:16 | 
  <gehel> | 
  restarting elasticsearch server elastic1028.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 01:15 | 
  <aude> | 
  applied Ibd302e1 to terbium for debugging broken wikidata rdf dumps | 
  [production] | 
            
  
    | 
      
        2016-04-29
      
      §
     | 
  
    
  | 22:57 | 
  <mutante> | 
  DNS  - forced authdns-gen-zones etc from https://phabricator.wikimedia.org/T97051#1994679 on ns0/ns1/ns2 to get new language added | 
  [production] | 
            
  | 20:59 | 
  <gehel> | 
  restarting elasticsearch server elastic1027.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 19:56 | 
  <urandom> | 
  (Re)starting cleanup on restbase1009-{a,b}.eqiad.wmnet | 
  [production] | 
            
  | 19:56 | 
  <catrope@tin> | 
  Synchronized php-1.27.0-wmf.22/extensions/CentralNotice/: T133971 (duration: 00m 41s) | 
  [production] | 
            
  | 19:29 | 
  <gehel> | 
  restarting elasticsearch server elastic1026.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 19:07 | 
  <gehel> | 
  restarting elasticsearch server elastic1025.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 18:21 | 
  <jzerebecki@tin> | 
  Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 28s) | 
  [production] | 
            
  | 18:20 | 
  <jzerebecki@tin> | 
  Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 29s) | 
  [production] | 
            
  | 18:14 | 
  <jzerebecki@tin> | 
  Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 34s) | 
  [production] | 
            
  | 18:14 | 
  <robh> | 
  started all slaves via dbstore2001 this time. | 
  [production] | 
            
  | 18:12 | 
  <jzerebecki@tin> | 
  Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 44s) | 
  [production] | 
            
  | 18:07 | 
  <robh> | 
  started all slaves via dbstore2002 per jaime's request | 
  [production] | 
            
  | 17:45 | 
  <gehel> | 
  restarting elasticsearch server elastic1024.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 16:56 | 
  <gehel> | 
  restarting elasticsearch server elastic1023.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 16:22 | 
  <gehel> | 
  restarting elasticsearch server elastic1022.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 15:29 | 
  <jynus@tin> | 
  Synchronized wmf-config/db-codfw.php: Repool db2047 and db2068. Depool db2008, db2009. Pool db2033 as the new x1 node. (duration: 00m 27s) | 
  [production] | 
            
  | 15:17 | 
  <gehel> | 
  restarting elasticsearch server elastic1021.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 14:56 | 
  <oblivian@palladium> | 
  conftool action : set/pooled=yes; selector: name=mw1153.eqiad.wmnet | 
  [production] | 
            
  | 14:54 | 
  <jynus> | 
  moving topology of db2033 to be the new x1 master on codfw | 
  [production] | 
            
  | 14:40 | 
  <oblivian@palladium> | 
  conftool action : set/pooled=no; selector: name=mw1153.eqiad.wmnet | 
  [production] | 
            
  | 14:32 | 
  <gehel> | 
  restarting elasticsearch server elastic1020.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 14:26 | 
  <hashar> | 
  Rebased tin:/srv/mediawiki-staging 31886c7..8e2670a  . Bring in 3 changes that are solely for beta cluster. | 
  [production] | 
            
  | 13:54 | 
  <jynus> | 
  stopping mysql db2008 (cloning to db2033) | 
  [production] | 
            
  | 13:39 | 
  <jynus> | 
  reimaging db2033 | 
  [production] | 
            
  | 13:09 | 
  <gehel> | 
  restarting elasticsearch server elastic1019.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 12:30 | 
  <gehel> | 
  restarting elasticsearch server elastic1018.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 11:39 | 
  <elukey> | 
  soft reboot for mw1119 (not responsive to ssh, root login timed out on the console) | 
  [production] | 
            
  | 09:43 | 
  <gehel> | 
  restarting elasticsearch server elastic1017.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 09:42 | 
  <gehel> | 
  restarting elasticsearch server elastic1016.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 09:01 | 
  <jynus> | 
  changing live configuration of db1049 thread_pool_stall_limit to 10 to test impact on connection timout | 
  [production] | 
            
  | 08:20 | 
  <gehel> | 
  restarting elasticsearch server elastic1016.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 07:57 | 
  <elukey> | 
  puppet disabled on new kafka codfw instances due to errors while starting Event Bus (hosts not in service) | 
  [production] | 
            
  | 07:54 | 
  <moritzm> | 
  enabled base::firewall on stat1002 | 
  [production] | 
            
  | 07:52 | 
  <gehel> | 
  restarting elasticsearch server elastic1015.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 07:36 | 
  <godog> | 
  stop cleanups on restbase1014-b | 
  [production] | 
            
  | 06:46 | 
  <jynus@tin> | 
  Synchronized wmf-config/db-eqiad.php: Reduce normal traffic on s2 API servers (duration: 00m 27s) | 
  [production] | 
            
  | 06:33 | 
  <jynus@tin> | 
  Synchronized wmf-config/db-eqiad.php: Repool db1038, increase weight of new hardware slaves db107[4-8] (duration: 00m 33s) | 
  [production] | 
            
  | 05:42 | 
  <gehel> | 
  restarting elasticsearch server elastic1014.eqiad.wmnet (T110236) | 
  [production] | 
            
  | 05:41 | 
  <mutante> | 
  re: "02:29 Krenair: last deployment was slow because of snapshot1007 being offline"  it's back, i don't know why, it was powered down and i just tried switching it on. that helped. the command is literally "power on" on HP | 
  [production] | 
            
  | 05:39 | 
  <mutante> | 
  snapshot1007 - was powered down, powering it on. (..connect to mgmt.. "damn it's a HP") | 
  [production] | 
            
  | 05:34 | 
  <mutante> | 
  snapshot1007 - not reachable, duration 10h | 
  [production] | 
            
  | 04:58 | 
  <gehel> | 
  restarting elasticsearch server elastic1013.eqiad.wmnet (T110236) | 
  [production] |