| 
      
        2021-06-28
      
      §
     | 
  
    
  | 11:18 | 
  <hnowlan@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009 | 
  [production] | 
            
  | 11:18 | 
  <Lucas_WMDE> | 
  lucaswerkmeister-wmde@mw1384:~$ scap pull # did not print any errors | 
  [production] | 
            
  | 11:13 | 
  <urbanecm@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: ade641b39bae8f2abb5d318299b033bfd8a7cb7a: Deploy ContentTranslation out of Beta feature in 9 WPs (T284641) (duration: 00m 56s) | 
  [production] | 
            
  | 10:44 | 
  <jdrewniak@deploy1002> | 
  Synchronized portals: Wikimedia Portals Update: [[gerrit:701884| Bumping portals to master (T128546)]] (duration: 00m 56s) | 
  [production] | 
            
  | 10:43 | 
  <jdrewniak@deploy1002> | 
  Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:701884| Bumping portals to master (T128546)]] (duration: 00m 57s) | 
  [production] | 
            
  | 10:25 | 
  <hnowlan@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 10:23 | 
  <mutante> | 
  sodium - restarted nginx | 
  [production] | 
            
  | 10:23 | 
  <hnowlan@cumin2002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 10:22 | 
  <mutante> | 
  sodium (mirrors.wikimedia.org) - switching to nginx light variant T164456 | 
  [production] | 
            
  | 10:11 | 
  <vgutierrez> | 
  rolling upgrade of ATS on eqiad - T285535 | 
  [production] | 
            
  | 10:11 | 
  <moritzm> | 
  installing remaining libxml2 security updates | 
  [production] | 
            
  | 09:52 | 
  <vgutierrez> | 
  rolling upgrade of ATS on esams - T285535 | 
  [production] | 
            
  | 09:42 | 
  <lucaswerkmeister-wmde@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701501|Remove $wmgWikibaseClientChangesDatabase (T257260)]] (2/2, beta) (duration: 00m 56s) | 
  [production] | 
            
  | 09:41 | 
  <lucaswerkmeister-wmde@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701501|Remove $wmgWikibaseClientChangesDatabase (T257260)]] (1/2, prod) (duration: 00m 57s) | 
  [production] | 
            
  | 09:40 | 
  <dzahn@cumin1001> | 
  END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org | 
  [production] | 
            
  | 09:40 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org | 
  [production] | 
            
  | 09:39 | 
  <Lucas_WMDE> | 
  ^ wrong gerrit change used for message, sorry | 
  [production] | 
            
  | 09:39 | 
  <lucaswerkmeister-wmde@deploy1002> | 
  sync-file aborted: Config: [[gerrit:701502|Stop setting Wikibase repo foreignRepositories (T257260)]] (1/2, prod) (duration: 00m 10s) | 
  [production] | 
            
  | 09:27 | 
  <vgutierrez> | 
  rolling upgrade of ATS on eqsin - T285535 | 
  [production] | 
            
  | 09:14 | 
  <lucaswerkmeister-wmde@deploy1002> | 
  Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701500|Stop setting Wikibase client changesDatabase (T257260)]] (duration: 00m 55s) | 
  [production] | 
            
  | 08:56 | 
  <vgutierrez> | 
  rolling upgrade of ATS on codfw - T285535 | 
  [production] | 
            
  | 08:53 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701875|Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538)]], Part II (duration: 00m 57s) | 
  [production] | 
            
  | 08:51 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701875|Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538)]], Part I (duration: 00m 56s) | 
  [production] | 
            
  | 08:48 | 
  <mutante> | 
  phab1001 - removing 2fa for my own account | 
  [production] | 
            
  | 08:40 | 
  <vgutierrez> | 
  rolling upgrade of ATS on ulsfo - T285535 | 
  [production] | 
            
  | 08:40 | 
  <jayme> | 
  drain kubestage2002 for docker restart(s) | 
  [production] | 
            
  | 08:33 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698751|Remove idGeneratorRateLimiting from production config (T274157)]], Part II (duration: 00m 55s) | 
  [production] | 
            
  | 08:31 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/Wikibase.php: Config: [[gerrit:698751|Remove idGeneratorRateLimiting from production config (T274157)]], Part I (duration: 00m 58s) | 
  [production] | 
            
  | 08:27 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:699540|Remove special configurations for Dagbani in Wikibase code (T283168)]] (duration: 00m 56s) | 
  [production] | 
            
  | 08:25 | 
  <jynus@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 08:23 | 
  <jynus@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 08:21 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/Wikibase.php: Config: [[gerrit:698518|Set Wikidata's main sandbox item (T219215)]], Part II (duration: 00m 56s) | 
  [production] | 
            
  | 08:19 | 
  <ladsgroup@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698518|Set Wikidata's main sandbox item (T219215)]], Part I (duration: 00m 57s) | 
  [production] | 
            
  | 08:19 | 
  <jynus> | 
  stop and remove db1145:s5 db2099:s5 T283235 | 
  [production] | 
            
  | 07:58 | 
  <dcausse> | 
  depool and restart blazegraph on wdqs1012 | 
  [production] | 
            
  | 07:57 | 
  <jelto> | 
  jelto@cumin1001:~$ sudo cumin install* 'run-puppet-agent' # update DHCP entry for gitlab2001 on install[1003,2003,3001,4001,5001].wikimedia.org | 
  [production] | 
            
  | 07:57 | 
  <dcausse> | 
  repool wdqs1005 | 
  [production] | 
            
  | 07:46 | 
  <hashar@deploy1002> | 
  Finished deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana (duration: 00m 08s) | 
  [production] | 
            
  | 07:46 | 
  <hashar@deploy1002> | 
  Started deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana | 
  [production] | 
            
  | 06:16 | 
  <XioNoX> | 
  remove BGP to AS13768 in AMS-IX | 
  [production] | 
            
  
    | 
      
        2021-06-27
      
      §
     | 
  
    
  | 09:10 | 
  <elukey> | 
  cumin 'A:mw-eqiad and not P{mw13[67,54,55,72,33,50,51,73,52,49,53,65,71,84,68,70,66,91,89,97,95,99,85,93,87]*} and not P{mw14[09,03,11,07,05,01]*} and not P{mw12[61-69]*} and not P{mwdebug*}' '/usr/local/sbin/restart-php7.2-fpm' -b 1 -s 30 | 
  [production] | 
            
  | 09:10 | 
  <elukey> | 
  roll restart the remaining mw appservers to clear out apcu framentation (cumin command to follow) | 
  [production] | 
            
  | 08:37 | 
  <elukey> | 
  restart php-fpm on mw1268 mw1269 - low busy workers | 
  [production] | 
            
  | 08:23 | 
  <elukey> | 
  restart php-fpm on mw1401 | 
  [production] |