| 
      
        2020-12-17
      
      ยง
     | 
  
    
  | 12:55 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13569 and previous config saved to /var/cache/conftool/dbconfig/20201217-125556-root.json | 
  [production] | 
            
  | 12:55 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Change db1089 weights', diff saved to https://phabricator.wikimedia.org/P13568 and previous config saved to /var/cache/conftool/dbconfig/20201217-125535-marostegui.json | 
  [production] | 
            
  | 12:54 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Repool db1106 after cloning db1154:3311 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13567 and previous config saved to /var/cache/conftool/dbconfig/20201217-125446-marostegui.json | 
  [production] | 
            
  | 12:40 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13566 and previous config saved to /var/cache/conftool/dbconfig/20201217-124052-root.json | 
  [production] | 
            
  | 12:36 | 
  <jbond42> | 
  disable puppet fleet wide for condif master vhost change | 
  [production] | 
            
  | 12:23 | 
  <matthiasmullie> | 
  EU backport+config window done | 
  [production] | 
            
  | 12:23 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 12:22 | 
  <mlitn@deploy1001> | 
  Synchronized wmf-config/InitialiseSettings.php: f3a50cb06: Enable ContentTranslation as default tool for ceb, km, mg, tg and yi WPs (duration: 01m 02s) | 
  [production] | 
            
  | 12:21 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 12:17 | 
  <mlitn@deploy1001> | 
  Synchronized wmf-config/InitialiseSettings.php: a29fec312: Add Wikidocumentaries campaign for ContentTranslation (duration: 01m 02s) | 
  [production] | 
            
  | 12:07 | 
  <mlitn@deploy1001> | 
  Synchronized wmf-config/SearchSettingsForSDC.php: 68ac6fa61: Media Search: Remove license map from config (duration: 01m 04s) | 
  [production] | 
            
  | 11:38 | 
  <kart_> | 
  Updated cxserver to 2020-12-17-111820-production (T262192) | 
  [production] | 
            
  | 11:36 | 
  <kartik@deploy1001> | 
  helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . | 
  [production] | 
            
  | 11:34 | 
  <kartik@deploy1001> | 
  helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . | 
  [production] | 
            
  | 11:32 | 
  <kartik@deploy1001> | 
  helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . | 
  [production] | 
            
  | 11:27 | 
  <godog> | 
  bounce apache2 on grafana1002 | 
  [production] | 
            
  | 11:26 | 
  <elukey@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:24 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:22 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:21 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:21 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:20 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:20 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:18 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:16 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:16 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 11:10 | 
  <jbond@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) | 
  [production] | 
            
  | 11:08 | 
  <jbond@cumin1001> | 
  START - Cookbook sre.hosts.reboot-single | 
  [production] | 
            
  | 10:50 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 | 
  [production] | 
            
  | 10:45 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 | 
  [production] | 
            
  | 10:21 | 
  <jbond42> | 
  updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872 | 
  [production] | 
            
  | 09:57 | 
  <vgutierrez> | 
  repool ats-tls on cp5011 | 
  [production] | 
            
  | 09:00 | 
  <marostegui> | 
  Sanitize s1 and s5 on db1154 T268742 | 
  [production] | 
            
  | 08:30 | 
  <godog> | 
  swift codfw-prod: more weight to ms-be20[58-61] - T269337 | 
  [production] | 
            
  | 07:49 | 
  <ryankemper> | 
  [wdqs deploy] (wdqs deploy complete) | 
  [production] | 
            
  | 07:19 | 
  <marostegui> | 
  Stop mysql on db1082 to clone db1154 | 
  [production] | 
            
  | 07:19 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json | 
  [production] | 
            
  | 07:18 | 
  <elukey> | 
  reboot an-airflow1001 for kernel upgrades | 
  [production] | 
            
  | 07:08 | 
  <elukey> | 
  update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706 | 
  [production] | 
            
  | 07:08 | 
  <ryankemper> | 
  [wdqs] depooled `wdqs1013` while it catches up on lag | 
  [production] | 
            
  | 07:06 | 
  <ryankemper> | 
  [wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` | 
  [production] | 
            
  | 07:05 | 
  <ryankemper> | 
  [wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` | 
  [production] | 
            
  | 07:05 | 
  <ryankemper> | 
  [wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` | 
  [production] | 
            
  | 07:04 | 
  <ryankemper@deploy1001> | 
  Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s) | 
  [production] | 
            
  | 06:54 | 
  <ryankemper> | 
  [wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet | 
  [production] | 
            
  | 06:53 | 
  <ryankemper@deploy1001> | 
  Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56 | 
  [production] | 
            
  | 06:53 | 
  <ryankemper> | 
  [wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy | 
  [production] | 
            
  | 06:52 | 
  <kart_> | 
  Updated cxserver to 2020-12-16-164911-production (T234220, T269437) | 
  [production] | 
            
  | 06:52 | 
  <kart_> | 
  Updated cxserver to 2020-12-16-164911-production (T234220, T234220) | 
  [production] | 
            
  | 06:22 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool es1013 for decommissioning T268436', diff saved to https://phabricator.wikimedia.org/P13562 and previous config saved to /var/cache/conftool/dbconfig/20201217-062249-marostegui.json | 
  [production] |