| 2023-01-17
      
      ยง | 
    
  | 15:31 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 15:26 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 15:26 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 14:56 | <urandom> | truncating hints for Cassandra nodes in codfw row b -- T327001 | [production] | 
            
  | 14:52 | <urandom> | disabling Cassandra hinted-handoff for codfw  -- T327001 | [production] | 
            
  | 14:27 | <jgiannelos@deploy1002> | helmfile [staging] DONE helmfile.d/services/proton: apply | [production] | 
            
  | 14:26 | <jgiannelos@deploy1002> | helmfile [staging] START helmfile.d/services/proton: apply | [production] | 
            
  | 14:12 | <_joe_> | try to restart cassandra-a on aqs2005 | [production] | 
            
  | 13:37 | <jiji@cumin1001> | conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw | [production] | 
            
  | 13:35 | <mvernon@cumin1001> | conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw | [production] | 
            
  | 13:35 | <mvernon@cumin1001> | conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw | [production] | 
            
  | 13:27 | <jynus> | restarting manually replication on es2020, may require data check afterwards | [production] | 
            
  | 13:26 | <_joe_> | depooling all services in codfw | [production] | 
            
  | 13:19 | <oblivian@cumin1001> | END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance | [production] | 
            
  | 13:15 | <mvernon@cumin1001> | conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw | [production] | 
            
  | 13:14 | <oblivian@cumin1001> | START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance | [production] | 
            
  | 13:13 | <oblivian@cumin1001> | END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance | [production] | 
            
  | 13:13 | <oblivian@cumin1001> | START - Cookbook sre.discovery.service-route check citoid: maintenance | [production] | 
            
  | 13:08 | <jelto@cumin1001> | END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) | [production] | 
            
  | 13:01 | <oblivian@puppetmaster1001> | conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw | [production] | 
            
  | 13:01 | <oblivian@puppetmaster1001> | conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.* | [production] | 
            
  | 12:35 | <jelto@cumin1001> | START - Cookbook sre.gitlab.upgrade | [production] | 
            
  | 12:35 | <moritzm> | installing ipython security updates | [production] | 
            
  | 11:32 | <jiji@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye | [production] | 
            
  | 11:18 | <jiji@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 11:16 | <jiji@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 11:08 | <volans> | upgraded cumin on cumin2002 to 4.2.0-1+deb11u1 | [production] | 
            
  | 11:04 | <jiji@cumin1001> | START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye | [production] | 
            
  | 10:16 | <godog> | restart opensearch_2@production-elk7-eqiad.service on logstash102[34] | [production] | 
            
  | 10:12 | <jnuche@deploy1002> | scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details) | [production] | 
            
  | 10:11 | <jnuche@deploy1002> | Finished scap: testwikis wikis to 1.40.0-wmf.19  refs T325582 (duration: 42m 26s) | [production] | 
            
  | 09:42 | <aqu@deploy1002> | Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s) | [production] | 
            
  | 09:42 | <aqu@deploy1002> | Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided) | [production] | 
            
  | 09:28 | <jnuche@deploy1002> | Started scap: testwikis wikis to 1.40.0-wmf.19  refs T325582 | [production] | 
            
  | 09:26 | <jnuche@deploy1002> | scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s) | [production] | 
            
  | 09:26 | <jnuche@deploy1002> | Started scap: testwikis wikis to 1.40.0-wmf.19  refs T325582 | [production] | 
            
  | 08:49 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 08:49 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 08:47 | <ladsgroup@deploy1002> | Finished scap: Backport for [[gerrit:879652|Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] (duration: 13m 50s) | [production] | 
            
  | 08:35 | <ladsgroup@deploy1002> | ladsgroup and dreamyjazz: Backport for [[gerrit:879652|Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | [production] | 
            
  | 08:33 | <ladsgroup@deploy1002> | Started scap: Backport for [[gerrit:879652|Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] | [production] | 
            
  | 08:29 | <kartik@deploy1002> | Finished scap: Backport for [[gerrit:879998|testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] (duration: 20m 56s) | [production] | 
            
  | 08:26 | <zabe> | zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T327146 | [production] | 
            
  | 08:13 | <kartik@deploy1002> | kartik and kartik: Backport for [[gerrit:879998|testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | [production] | 
            
  | 08:08 | <kartik@deploy1002> | Started scap: Backport for [[gerrit:879998|testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] | [production] | 
            
  | 07:52 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json | [production] | 
            
  | 07:37 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json | [production] | 
            
  | 07:22 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json | [production] | 
            
  | 07:16 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 07:16 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance | [production] |