| 
      
        2022-06-03
      
      ยง
     | 
  
    
  | 19:50 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29381 and previous config saved to /var/cache/conftool/dbconfig/20220603-195052-ladsgroup.json | 
  [production] | 
            
  | 19:35 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29380 and previous config saved to /var/cache/conftool/dbconfig/20220603-193547-ladsgroup.json | 
  [production] | 
            
  | 19:29 | 
  <mutante> | 
  gitlab2002 - stop rsync service, apt-get remove --purge rsync, delete /etc/rsync.d/ and /etc/rsyncd.conf - after gerrit:802847 T274463 | 
  [production] | 
            
  | 19:20 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1118 (T298560)', diff saved to https://phabricator.wikimedia.org/P29379 and previous config saved to /var/cache/conftool/dbconfig/20220603-192042-ladsgroup.json | 
  [production] | 
            
  | 18:51 | 
  <jhathaway@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 18:51 | 
  <jhathaway@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 5:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 18:47 | 
  <mutante> | 
  testreduce - re-enabling Icinga notifications that were disabled for unknown reasons | 
  [production] | 
            
  | 18:45 | 
  <mutante> | 
  testreduce1001 - systemctl reset-failed after gerrit:800245 removed failed auto_restart services for non-existing apache and php services | 
  [production] | 
            
  | 18:34 | 
  <mutante> | 
  deleting expired digicert TLS certs https://gerrit.wikimedia.org/r/c/operations/puppet/+/791678 | 
  [production] | 
            
  | 18:09 | 
  <aokoth@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet | 
  [production] | 
            
  | 18:01 | 
  <aokoth@cumin1001> | 
  START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet | 
  [production] | 
            
  | 17:18 | 
  <mwdebug-deploy@deploy1002> | 
  helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 17:14 | 
  <mwdebug-deploy@deploy1002> | 
  helmfile [codfw] START helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 17:14 | 
  <mwdebug-deploy@deploy1002> | 
  helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 17:14 | 
  <mwdebug-deploy@deploy1002> | 
  helmfile [eqiad] START helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:20 | 
  <dancy@deploy1002> | 
  sync-wikiversions aborted: testing mediawiki container image build and deploy (duration: 07m 07s) | 
  [production] | 
            
  | 16:20 | 
  <dancy@deploy1002> | 
  helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:20 | 
  <dancy@deploy1002> | 
  helmfile [codfw] START helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:20 | 
  <dancy@deploy1002> | 
  helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:19 | 
  <dancy@deploy1002> | 
  helmfile [eqiad] START helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:17 | 
  <dancy@deploy1002> | 
  helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:15 | 
  <jhathaway@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 16:15 | 
  <jhathaway@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 3:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 16:13 | 
  <dancy@deploy1002> | 
  helmfile [eqiad] START helmfile.d/services/mwdebug: apply | 
  [production] | 
            
  | 16:12 | 
  <dancy@deploy1002> | 
  sync-wikiversions aborted: testing mediawiki container image build and deploy (duration: 00m 11s) | 
  [production] | 
            
  | 16:11 | 
  <bking@cumin1001> | 
  START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: restart to enable S3 plugin - bking@cumin1001 - T309720 | 
  [production] | 
            
  | 16:06 | 
  <herron@cumin1001> | 
  END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. | 
  [production] | 
            
  | 14:58 | 
  <jhathaway@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 14:58 | 
  <jhathaway@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: BDAT | 
  [production] | 
            
  | 14:25 | 
  <herron@cumin1001> | 
  START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. | 
  [production] | 
            
  | 14:14 | 
  <inflatador> | 
  patching and restarting a few eqiad elastic hosts T309868 | 
  [production] | 
            
  | 12:07 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Depooling db1141 (T298560)', diff saved to https://phabricator.wikimedia.org/P29370 and previous config saved to /var/cache/conftool/dbconfig/20220603-120758-ladsgroup.json | 
  [production] | 
            
  | 12:07 | 
  <ladsgroup@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 12:07 | 
  <ladsgroup@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 12:07 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298560)', diff saved to https://phabricator.wikimedia.org/P29369 and previous config saved to /var/cache/conftool/dbconfig/20220603-120750-ladsgroup.json | 
  [production] | 
            
  | 11:52 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29368 and previous config saved to /var/cache/conftool/dbconfig/20220603-115244-ladsgroup.json | 
  [production] | 
            
  | 11:37 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29367 and previous config saved to /var/cache/conftool/dbconfig/20220603-113739-ladsgroup.json | 
  [production] | 
            
  | 11:22 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298560)', diff saved to https://phabricator.wikimedia.org/P29366 and previous config saved to /var/cache/conftool/dbconfig/20220603-112234-ladsgroup.json | 
  [production] | 
            
  | 09:28 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1001.wikimedia.org | 
  [production] | 
            
  | 09:28 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 09:24 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 09:21 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.hosts.decommission for hosts idp-test1001.wikimedia.org | 
  [production] | 
            
  | 09:20 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2001.wikimedia.org | 
  [production] | 
            
  | 09:20 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 09:15 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 09:11 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.hosts.decommission for hosts idp-test2001.wikimedia.org | 
  [production] | 
            
  | 09:00 | 
  <cmooney@cumin1001> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 08:58 | 
  <jnuche@deploy1002> | 
  install-world aborted:  (duration: 00m 03s) | 
  [production] | 
            
  | 08:56 | 
  <cmooney@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 08:56 | 
  <cmooney@cumin1001> | 
  END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | 
  [production] |