| 2024-07-03
      
      ยง | 
    
  | 14:48 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json | [production] | 
            
  | 14:46 | <jiji@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply | [production] | 
            
  | 14:45 | <jiji@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply | [production] | 
            
  | 14:41 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json | [production] | 
            
  | 14:40 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json | [production] | 
            
  | 14:40 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json | [production] | 
            
  | 14:40 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:40 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:40 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:39 | <jiji@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | [production] | 
            
  | 14:39 | <jiji@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-debug: apply | [production] | 
            
  | 14:38 | <jiji@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | [production] | 
            
  | 14:38 | <jiji@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-debug: apply | [production] | 
            
  | 14:35 | <sukhe> | [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent" | [production] | 
            
  | 14:33 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json | [production] | 
            
  | 14:33 | <sukhe> | sudo cumin "A:dnsbox" "run-puppet-agent" | [production] | 
            
  | 14:32 | <sukhe> | sudo cumin "A:wikidough" "run-puppet-agent" | [production] | 
            
  | 14:32 | <jayme@cumin1002> | END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet | [production] | 
            
  | 14:32 | <jayme@cumin1002> | START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet | [production] | 
            
  | 14:30 | <jayme@cumin1002> | conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) | [production] | 
            
  | 14:27 | <jiji@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply | [production] | 
            
  | 14:27 | <jiji@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply | [production] | 
            
  | 14:26 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json | [production] | 
            
  | 14:25 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json | [production] | 
            
  | 14:25 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json | [production] | 
            
  | 14:25 | <klausman@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 14:21 | <jayme@cumin1002> | conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) | [production] | 
            
  | 14:18 | <bking@deploy1002> | helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply | [production] | 
            
  | 14:18 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json | [production] | 
            
  | 14:17 | <arnaudb@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994 | [production] | 
            
  | 14:17 | <arnaudb@cumin1002> | START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994 | [production] | 
            
  | 14:17 | <arnaudb@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994 | [production] | 
            
  | 14:16 | <arnaudb@cumin1002> | START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994 | [production] | 
            
  | 14:11 | <klausman@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 14:10 | <jclark@cumin1002> | START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye | [production] | 
            
  | 14:09 | <klausman@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 14:09 | <klausman@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 14:09 | <bking@deploy1002> | helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply | [production] | 
            
  | 14:08 | <klausman@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 14:07 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye | [production] | 
            
  | 14:04 | <topranks> | rebooting lsw1-e2-eqiad to install updated JunOS version T365994 | [production] | 
            
  | 14:01 | <cmooney@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad | [production] | 
            
  | 14:00 | <cmooney@cumin1002> | START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad | [production] | 
            
  | 13:59 | <bking@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 | [production] | 
            
  | 13:59 | <bking@cumin2002> | START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 | [production] | 
            
  | 13:58 | <cmooney@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad | [production] | 
            
  | 13:58 | <cmooney@cumin1002> | START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad | [production] | 
            
  | 13:57 | <jayme@cumin1002> | conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) | [production] | 
            
  | 13:56 | <bking@cumin2002> | END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 | [production] | 
            
  | 13:56 | <bking@cumin2002> | START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 | [production] |