| 
      
        2023-02-21
      
      §
     | 
  
    
  | 11:11 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2006.codfw.wmnet with reason: host reimage | 
  [production] | 
            
  | 11:01 | 
  <jiji@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 11:00 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.ganeti.reimage for host kubetcd2006.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 10:59 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.ganeti.reimage for host kubetcd2005.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 10:59 | 
  <root@cumin1001> | 
  START - Cookbook sre.ganeti.reimage for host kubetcd2004.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 10:58 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44699 and previous config saved to /var/cache/conftool/dbconfig/20230221-105823-root.json | 
  [production] | 
            
  | 10:57 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Depool db2161 T330134', diff saved to https://phabricator.wikimedia.org/P44698 and previous config saved to /var/cache/conftool/dbconfig/20230221-105714-ladsgroup.json | 
  [production] | 
            
  | 10:56 | 
  <jiji@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 10:55 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664 | 
  [production] | 
            
  | 10:55 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Promote db2165 to s8 primary T330134', diff saved to https://phabricator.wikimedia.org/P44697 and previous config saved to /var/cache/conftool/dbconfig/20230221-105503-ladsgroup.json | 
  [production] | 
            
  | 10:54 | 
  <Amir1> | 
  Starting s8 codfw failover from db2161 to db2165 - T330134 | 
  [production] | 
            
  | 10:50 | 
  <nfraison@cumin1001> | 
  START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 10:49 | 
  <isaranto@deploy1002> | 
  helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . | 
  [production] | 
            
  | 10:46 | 
  <jayme@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23 | 
  [production] | 
            
  | 10:46 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23 | 
  [production] | 
            
  | 10:43 | 
  <jiji@cumin1001> | 
  START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 10:39 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . | 
  [production] | 
            
  | 10:37 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 10:37 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 10:33 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 10:30 | 
  <ladsgroup@cumin1001> | 
  dbctl commit (dc=all): 'Set db2165 with weight 0 T330134', diff saved to https://phabricator.wikimedia.org/P44696 and previous config saved to /var/cache/conftool/dbconfig/20230221-103053-ladsgroup.json | 
  [production] | 
            
  | 10:30 | 
  <ladsgroup@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134 | 
  [production] | 
            
  | 10:29 | 
  <ladsgroup@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134 | 
  [production] | 
            
  | 10:29 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 09:59 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 09:53 | 
  <jayme@cumin1001> | 
  END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool 2 services in codfw: T329664 | 
  [production] | 
            
  | 09:48 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.discovery.service-route depool 2 services in codfw: T329664 | 
  [production] | 
            
  | 09:48 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 09:46 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 09:36 | 
  <elukey@deploy1002> | 
  helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 09:31 | 
  <jayme@cumin1001> | 
  END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in codfw: maintenance | 
  [production] | 
            
  | 09:24 | 
  <filippo@cumin1001> | 
  conftool action : set/pooled=no; selector: name=prometheus2005.codfw.wmnet | 
  [production] | 
            
  | 09:24 | 
  <filippo@cumin1001> | 
  conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet | 
  [production] | 
            
  | 09:14 | 
  <vgutierrez> | 
  testing HAProxy 2.6.9 in cp4052 and cp4044 | 
  [production] | 
            
  | 09:13 | 
  <jayme@cumin1001> | 
  START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: maintenance | 
  [production] | 
            
  | 09:12 | 
  <hashar@deploy1002> | 
  Pruned MediaWiki: 1.40.0-wmf.22 (duration: 02m 16s) | 
  [production] | 
            
  | 09:12 | 
  <vgutierrez> | 
  update thirdparty/haproxy26 to version 2.6.9 for bullseye and buster (apt.wm.o) | 
  [production] | 
            
  | 09:10 | 
  <hashar@deploy1002> | 
  Finished scap: testwikis wikis to 1.40.0-wmf.24  refs T325587 (duration: 45m 58s) | 
  [production] | 
            
  | 08:49 | 
  <moritzm> | 
  installing clamav security updates | 
  [production] | 
            
  | 08:24 | 
  <hashar@deploy1002> | 
  Started scap: testwikis wikis to 1.40.0-wmf.24  refs T325587 | 
  [production] | 
            
  | 08:21 | 
  <kartik@deploy1002> | 
  Finished scap: Backport for [[gerrit:890482|Section Translation: Fix language code for Cantonese Wikipedia (T304865)]] (duration: 16m 36s) | 
  [production] | 
            
  | 08:09 | 
  <kartik@deploy1002> | 
  kartik: Backport for [[gerrit:890482|Section Translation: Fix language code for Cantonese Wikipedia (T304865)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | 
  [production] | 
            
  | 08:04 | 
  <kartik@deploy1002> | 
  Started scap: Backport for [[gerrit:890482|Section Translation: Fix language code for Cantonese Wikipedia (T304865)]] | 
  [production] | 
            
  | 07:49 | 
  <XioNoX> | 
  Staging the new Junos version on the codfw row B switches - T327991 | 
  [production] | 
            
  | 01:06 | 
  <urbanecm@deploy1002> | 
  Finished scap: Backport for [[gerrit:890173|cswikibooks: Enable visualeditor for all users (T330015)]] (duration: 08m 47s) | 
  [production] | 
            
  | 00:59 | 
  <urbanecm@deploy1002> | 
  urbanecm: Backport for [[gerrit:890173|cswikibooks: Enable visualeditor for all users (T330015)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | 
  [production] | 
            
  | 00:57 | 
  <urbanecm@deploy1002> | 
  Started scap: Backport for [[gerrit:890173|cswikibooks: Enable visualeditor for all users (T330015)]] | 
  [production] |