| 
      
        2024-09-03
      
      ยง
     | 
  
    
  | 14:25 | 
  <hnowlan@cumin1002> | 
  END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubernetes2055.codfw.wmnet | 
  [production] | 
            
  | 14:25 | 
  <hnowlan@cumin1002> | 
  START - Cookbook sre.k8s.pool-depool-node depool for host mw2423.codfw.wmnet | 
  [production] | 
            
  | 14:25 | 
  <hnowlan@cumin1002> | 
  START - Cookbook sre.k8s.pool-depool-node depool for host mw2422.codfw.wmnet | 
  [production] | 
            
  | 14:24 | 
  <hnowlan@cumin1002> | 
  START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2055.codfw.wmnet | 
  [production] | 
            
  | 14:24 | 
  <hnowlan@cumin1002> | 
  END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubernetes2028.codfw.wmnet | 
  [production] | 
            
  | 14:10 | 
  <denisse> | 
  Resolve DNS queries to alert2002 - T372418 | 
  [production] | 
            
  | 14:06 | 
  <denisse> | 
  Failing over to alert2002 - T372418 | 
  [production] | 
            
  | 14:03 | 
  <denisse> | 
  Stopping services in the alert1001 host - T372418 | 
  [production] | 
            
  | 14:02 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2207 (T371742)', diff saved to https://phabricator.wikimedia.org/P68609 and previous config saved to /var/cache/conftool/dbconfig/20240903-140226-ladsgroup.json | 
  [production] | 
            
  | 14:00 | 
  <denisse> | 
  Disabling meta-monitoring for the alert hosts - T372418 | 
  [production] | 
            
  | 14:00 | 
  <denisse> | 
  Disabling meta-monitoring for the alert hosts | 
  [production] | 
            
  | 14:00 | 
  <jgleeson> | 
  smashpig updated from e7c7d116 to e625eef2 | 
  [production] | 
            
  | 13:59 | 
  <ejegg> | 
  payments-wiki upgraded from 54988ad9 to e47e61cb | 
  [production] | 
            
  | 13:58 | 
  <elukey@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:55 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:54 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:51 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:49 | 
  <jayme@deploy1003> | 
  helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:49 | 
  <jayme@deploy1003> | 
  helmfile [eqiad] START helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:48 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:47 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68606 and previous config saved to /var/cache/conftool/dbconfig/20240903-134719-ladsgroup.json | 
  [production] | 
            
  | 13:45 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:41 | 
  <jayme@deploy1003> | 
  helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:41 | 
  <jayme@deploy1003> | 
  helmfile [codfw] START helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:37 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:34 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:32 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68605 and previous config saved to /var/cache/conftool/dbconfig/20240903-133211-ladsgroup.json | 
  [production] | 
            
  | 13:30 | 
  <jayme@deploy1003> | 
  helmfile [staging] DONE helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:29 | 
  <jayme@deploy1003> | 
  helmfile [staging] START helmfile.d/services/eventgate-main: apply | 
  [production] | 
            
  | 13:29 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:25 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:20 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:17 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2207 (T371742)', diff saved to https://phabricator.wikimedia.org/P68604 and previous config saved to /var/cache/conftool/dbconfig/20240903-131704-ladsgroup.json | 
  [production] | 
            
  | 13:16 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:10 | 
  <elukey@cumin2002> | 
  END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 13:07 | 
  <elukey@cumin2002> | 
  START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART | 
  [production] | 
            
  | 12:52 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:51 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:43 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:43 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:26 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Depooling db2207 (T371742)', diff saved to https://phabricator.wikimedia.org/P68602 and previous config saved to /var/cache/conftool/dbconfig/20240903-122647-ladsgroup.json | 
  [production] | 
            
  | 12:26 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 12:26 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 12:24 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:24 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 12:20 | 
  <stevemunene@deploy1003> | 
  helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply | 
  [production] | 
            
  | 11:42 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 11:42 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 11:42 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2189 (T371742)', diff saved to https://phabricator.wikimedia.org/P68601 and previous config saved to /var/cache/conftool/dbconfig/20240903-114232-ladsgroup.json | 
  [production] | 
            
  | 11:31 | 
  <vgutierrez@cumin1002> | 
  END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-magru for 9.2.5-1wm2 | 
  [production] |