| 
      
        2011-07-06
      
      §
     | 
  
    
  | 21:54 | 
  <Ryan_Lane> | 
  depooling srv169 | 
  [production] | 
            
  | 21:51 | 
  <Ryan_Lane> | 
  restarted apache on singer | 
  [production] | 
            
  | 21:46 | 
  <laner> | 
  synchronized php-1.17/wmf-config/db.php  'depooling srv154, repooling srv178' | 
  [production] | 
            
  | 21:30 | 
  <pdhanda> | 
  ran sync-common-all  'Synced to r91606 for ArticleFeedback' | 
  [production] | 
            
  | 21:27 | 
  <Ryan_Lane> | 
  setting proxy setting for secure back to original setting. removed ~ files from sites-enabled | 
  [production] | 
            
  | 21:17 | 
  <Ryan_Lane> | 
  putting retry=3 back, and adding a timeout of 15 seconds to secure | 
  [production] | 
            
  | 21:16 | 
  <Ryan_Lane> | 
  removed retry=3 from ProxyPass directive for secure. 3 seconds really isn't enough for this service... | 
  [production] | 
            
  | 21:06 | 
  <RobH> | 
  running puppet on spence, this is going to take forever. | 
  [production] | 
            
  | 21:05 | 
  <Ryan_Lane> | 
  restarting apache on singer | 
  [production] | 
            
  | 19:37 | 
  <mark> | 
  Added DNS entries for cr1-sdtpa and cr2-pmtpa | 
  [production] | 
            
  | 19:25 | 
  <hashar:> | 
  hexmode raised an user issue with blocking. It is a lock wait timeout happening from time to time on enwiki. 30 occurences in dberror.log for Block::purgeExpired. Could not reproduce it so I am assuming it was temporary issue. | 
  [production] | 
            
  | 19:15 | 
  <hashar:> | 
  srv154 seems unreachable. dberror.log is spammed with "Error connecting to <srv154 IP>" | 
  [production] | 
            
  | 19:13 | 
  <RobH> | 
  added webmaster@ to other top level domain mail routing to forward to the wikimedia.org webmaster for google securebrowsing stuff per RT#1122 | 
  [production] | 
            
  | 18:08 | 
  <pdhanda> | 
  running maintenance/cleanupTitles.php on commonswiki | 
  [production] | 
            
  | 17:51 | 
  <pdhanda> | 
  Running maintenances/namespaceDupesWT.php on commonswiki | 
  [production] | 
            
  | 17:12 | 
  <RobH> | 
  srv169 successfully back in service, tests fine and has all updated files, lvs3 updated to include it in pool | 
  [production] | 
            
  | 17:11 | 
  <RobH> | 
  returning srv169 into service | 
  [production] | 
            
  | 15:37 | 
  <mark> | 
  Removed ms5:/etc/cron.d/mdadm | 
  [production] | 
            
  | 15:37 | 
  <mark> | 
  Stopped MD raid resync on ms5 | 
  [production] | 
            
  | 15:28 | 
  <RobH> | 
  search18 booted back up successfully | 
  [production] | 
            
  | 15:25 | 
  <RobH> | 
  api lag issues known due to search server failure, being worked presently | 
  [production] | 
            
  | 15:24 | 
  <RobH> | 
  search18 sas configuration bios confirms both disks are still in a non-degraded (according to it) mirror | 
  [production] | 
            
  | 15:23 | 
  <RobH> | 
  search18 randomly rebooted after checking disks before the login prompt | 
  [production] | 
            
  | 15:19 | 
  <RobH> | 
  rebooting search18 | 
  [production] | 
            
  | 15:14 | 
  <RobH> | 
  search18 appears to be completely offline, investigating lom logs before rebooting. | 
  [production] | 
            
  | 15:12 | 
  <RobH> | 
  search18 offline, logging into mgmt to check it out | 
  [production] | 
            
  | 15:01 | 
  <RobH> | 
  eqiad humidity levels ticket dispatched for fufillment | 
  [production] | 
            
  | 14:37 | 
  <mark> | 
  Paused rsyncs on ms5 | 
  [production] | 
            
  | 14:04 | 
  <mark> | 
  Powercycled sq36 | 
  [production] | 
            
  | 13:18 | 
  <^demon|away> | 
  fixed permissions on svn c/o on ci.tesla, ran svn cleanup. cruise control still not pleased and yelling about locks | 
  [production] | 
            
  | 13:16 | 
  <mark> | 
  Upgrading firmware of scs-a1-sdtpa | 
  [production] | 
            
  | 12:51 | 
  <mark> | 
  csw5-pmtpa crashed and reloaded | 
  [production] | 
            
  | 11:53 | 
  <mark> | 
  Upgrading firmware of scs-c1-pmtpa | 
  [production] | 
            
  
    | 
      
        2011-07-05
      
      §
     | 
  
    
  | 23:53 | 
  <^demon> | 
  scratch that....ssh just seems to have been rather slow in getting its act together. ci.tesla is just fine now | 
  [production] | 
            
  | 23:50 | 
  <^demon> | 
  well now I've locked myself out of ci.tesla. Seems it doesn't start ssh on reboot...what a silly thing to do | 
  [production] | 
            
  | 23:47 | 
  <^demon> | 
  rebooting ci.tesla since it was horribly hung up on the latest build--was it really stuck for the past 24hrs? | 
  [production] | 
            
  | 23:35 | 
  <reedy> | 
  synchronized php-1.17/resources/mediawiki/mediawiki.js  'r91505' | 
  [production] | 
            
  | 21:28 | 
  <pdhanda> | 
  ran sync-common-all  'Synced to r91494 for WikiLove' | 
  [production] | 
            
  | 21:10 | 
  <pdhanda> | 
  synchronized php-1.17/resources/jquery.ui/themes/vector/jquery.ui.button.css  'Synced to r 91493.' | 
  [production] | 
            
  | 21:00 | 
  <pdhanda> | 
  synchronized php-1.17/resources/jquery.ui/themes/vector/jquery.ui.button.css  'Synced to r 91490.' | 
  [production] | 
            
  | 19:53 | 
  <reedy> | 
  synchronized php-1.17/includes/api/ApiQuery.php  'r91479' | 
  [production] | 
            
  | 19:07 | 
  <catrope> | 
  synchronized php-1.17/extensions/Vector/modules/ext.vector.collapsibleTabs.js  'r91476' | 
  [production] | 
            
  | 19:07 | 
  <catrope> | 
  synchronized php-1.17/extensions/Vector/modules/ext.vector.simpleSearch.js  'r91476' | 
  [production] | 
            
  | 16:49 | 
  <RoanKattouw> | 
  Short CPU spike on the Apaches, approx 16:45-16:50 UTC. Things seem to be recovering now | 
  [production] | 
            
  | 16:04 | 
  <mark> | 
  Put OSPF/OSPFv3 on csw1-sdtpa:e14/1 in active mode again; appears stable so far | 
  [production] | 
            
  | 15:45 | 
  <mark> | 
  Reenabled csw1-sdtpa:e14/1 with OSPF/OSPFv3 in passive mode and high metric | 
  [production] | 
            
  | 15:31 | 
  <mark> | 
  Shutdown csw1-sdtpa:e14/1 (l3 wave) | 
  [production] | 
            
  | 15:24 | 
  <mark> | 
  Disabled IP load-sharing on csw1-esams (out of next-hops), disabled e14/1 (10G wave L3), reenabled it | 
  [production] | 
            
  | 13:38 | 
  <mark> | 
  Rebooting streber; dpkg hung | 
  [production] | 
            
  | 08:34 | 
  <mark> | 
  Rerouted traffic AS14907 -> AS43821 via 2828 | 
  [production] |