| 2024-04-12
      
      ยง | 
    
  | 18:40 | <andrew@cumin1002> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 18:35 | <andrew@cumin1002> | START - Cookbook sre.hosts.decommission for hosts cloudbackup2001.codfw.wmnet | [production] | 
            
  | 17:00 | <mutante> | crm2001 - on initial puppet run adding envoy build-envoy-config failed building config and service failed due to dependency issue. manual run of "sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/" and restarted envoyproxy.service | [production] | 
            
  | 16:19 | <btullis@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 16:16 | <elukey> | move cassandra instances on cassandra-dev to the new truststore (allowing PKI certs) - T352647 | [production] | 
            
  | 15:59 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | [production] | 
            
  | 15:56 | <sukhe@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues | [production] | 
            
  | 15:56 | <sukhe@cumin1002> | START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues | [production] | 
            
  | 15:55 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . | [production] | 
            
  | 15:53 | <isaranto@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | [production] | 
            
  | 15:51 | <bking@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic2090.codfw.wmnet with reason: T353878 | [production] | 
            
  | 15:51 | <bking@cumin2002> | START - Cookbook sre.hosts.downtime for 1:00:00 on elastic2090.codfw.wmnet with reason: T353878 | [production] | 
            
  | 15:51 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . | [production] | 
            
  | 15:50 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . | [production] | 
            
  | 15:50 | <bking@cumin2002> | END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2090 for reboot to get rid of broken systemd units - bking@cumin2002 - T353878 | [production] | 
            
  | 15:50 | <bking@cumin2002> | START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2090 for reboot to get rid of broken systemd units - bking@cumin2002 - T353878 | [production] | 
            
  | 15:50 | <btullis@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 15:49 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | [production] | 
            
  | 15:49 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . | [production] | 
            
  | 15:48 | <pt1979@cumin2002> | START - Cookbook sre.hosts.dhcp for host cp1115.eqiad.wmnet | [production] | 
            
  | 15:47 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . | [production] | 
            
  | 15:46 | <btullis@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 15:46 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . | [production] | 
            
  | 15:32 | <btullis@cumin1002> | START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 15:31 | <btullis@cumin1002> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 15:23 | <ayounsi@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "magru - ayounsi@cumin1002" | [production] | 
            
  | 15:22 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . | [production] | 
            
  | 15:21 | <ayounsi@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "magru - ayounsi@cumin1002" | [production] | 
            
  | 15:07 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . | [production] | 
            
  | 15:03 | <elukey@deploy1002> | helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | [production] | 
            
  | 15:03 | <ayounsi@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "magru - ayounsi@cumin1002" | [production] | 
            
  | 15:03 | <elukey@deploy1002> | helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | [production] | 
            
  | 15:02 | <btullis@cumin1002> | START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 15:01 | <ayounsi@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "magru - ayounsi@cumin1002" | [production] | 
            
  | 14:59 | <btullis@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:22 | <hashar@deploy1002> | Finished scap: Backport for [[gerrit:1018692|Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221)]] (duration: 16m 29s) | [production] | 
            
  | 14:19 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . | [production] | 
            
  | 14:18 | <elukey@deploy1002> | helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | [production] | 
            
  | 14:18 | <elukey@deploy1002> | helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | [production] | 
            
  | 14:17 | <btullis@cumin1002> | START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:09 | <hashar@deploy1002> | hashar and jforrester: Continuing with sync | [production] | 
            
  | 14:08 | <hashar@deploy1002> | hashar and jforrester: Backport for [[gerrit:1018692|Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | [production] | 
            
  | 14:08 | <sukhe> | depool cp1115 for PXE boot issue testing: T350179 | [production] | 
            
  | 14:07 | <sukhe@puppetmaster1001> | conftool action : set/pooled=no; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be) | [production] | 
            
  | 14:05 | <hashar@deploy1002> | Started scap: Backport for [[gerrit:1018692|Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221)]] | [production] | 
            
  | 12:53 | <jayme> | updated rsyslog to 8.2404.0-1~bpo11+1 on staging-codfw and staging-eqiad k8s clusters - T357616 | [production] | 
            
  | 12:20 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P60466 and previous config saved to /var/cache/conftool/dbconfig/20240412-122045-marostegui.json | [production] | 
            
  | 12:05 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P60464 and previous config saved to /var/cache/conftool/dbconfig/20240412-120537-marostegui.json | [production] | 
            
  | 12:02 | <btullis@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm | [production] | 
            
  | 11:50 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1249 (T356166)', diff saved to https://phabricator.wikimedia.org/P60463 and previous config saved to /var/cache/conftool/dbconfig/20240412-115029-marostegui.json | [production] |