| 2022-07-13
      
      § | 
    
  | 06:47 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31030 and previous config saved to /var/cache/conftool/dbconfig/20220713-064725-root.json | [production] | 
            
  | 06:45 | <aqu> | analytics/refinery deploy aborted, no more space to deploy in /srv on an-launcher1002 eqiad | [production] | 
            
  | 06:44 | <aqu@deploy1002> | Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] (duration: 27m 02s) | [production] | 
            
  | 06:32 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31029 and previous config saved to /var/cache/conftool/dbconfig/20220713-063221-root.json | [production] | 
            
  | 06:17 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31028 and previous config saved to /var/cache/conftool/dbconfig/20220713-061717-root.json | [production] | 
            
  | 06:16 | <aqu@deploy1002> | Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] | [production] | 
            
  | 06:16 | <aqu> | analytics/refinery deployment | [production] | 
            
  | 06:02 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31027 and previous config saved to /var/cache/conftool/dbconfig/20220713-060213-root.json | [production] | 
            
  | 05:47 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31026 and previous config saved to /var/cache/conftool/dbconfig/20220713-054709-root.json | [production] | 
            
  | 05:32 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31025 and previous config saved to /var/cache/conftool/dbconfig/20220713-053205-root.json | [production] | 
            
  | 05:17 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31024 and previous config saved to /var/cache/conftool/dbconfig/20220713-051701-root.json | [production] | 
            
  | 05:12 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Pool db2162 in s8 T311493', diff saved to https://phabricator.wikimedia.org/P31023 and previous config saved to /var/cache/conftool/dbconfig/20220713-051239-marostegui.json | [production] | 
            
  
    | 2022-07-12
      
      § | 
    
  | 22:32 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2039.codfw.wmnet with OS bullseye | [production] | 
            
  | 22:19 | <ebernhardson@deploy1002> | Finished deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec (duration: 02m 04s) | [production] | 
            
  | 22:17 | <ebernhardson@deploy1002> | Started deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec | [production] | 
            
  | 22:15 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage | [production] | 
            
  | 22:11 | <bking@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage | [production] | 
            
  | 21:50 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2039.codfw.wmnet with OS bullseye | [production] | 
            
  | 20:28 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 20:11 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage | [production] | 
            
  | 20:07 | <bking@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage | [production] | 
            
  | 19:49 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:38 | <bking@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:35 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:34 | <bking@cumin1001> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:31 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:31 | <bking@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:31 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:30 | <bking@cumin1001> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:27 | <bking@cumin1001> | START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye | [production] | 
            
  | 19:26 | <krinkle@deploy1002> | Synchronized wmf-config/InitialiseSettings.php: I3071c009c (2) (duration: 02m 45s) | [production] | 
            
  | 19:21 | <mwdebug-deploy@deploy1002> | helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 19:20 | <krinkle@deploy1002> | Synchronized wmf-config/InitialiseSettings.php: I3071c009c (duration: 03m 09s) | [production] | 
            
  | 19:20 | <mwdebug-deploy@deploy1002> | helmfile [codfw] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 19:20 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 19:20 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298 | [production] | 
            
  | 19:19 | <bking@cumin1001> | START - Cookbook sre.hosts.downtime for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298 | [production] | 
            
  | 19:19 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 19:13 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic1065.eqiad.wmnet | [production] | 
            
  | 19:13 | <bking@cumin1001> | START - Cookbook sre.hosts.remove-downtime for elastic1065.eqiad.wmnet | [production] | 
            
  | 18:54 | <mwdebug-deploy@deploy1002> | helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 18:53 | <mwdebug-deploy@deploy1002> | helmfile [codfw] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 18:53 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 18:52 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 17:26 | <mwdebug-deploy@deploy1002> | helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 17:24 | <mwdebug-deploy@deploy1002> | helmfile [codfw] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 17:24 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | [production] | 
            
  | 17:22 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] START helmfile.d/services/mwdebug: apply | [production] | 
            
  | 17:18 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2037.codfw.wmnet with OS bullseye | [production] | 
            
  | 16:59 | <bking@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage | [production] |