| 2024-05-28
      
      ยง | 
    
  | 15:13 | <akosiaris@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 15:12 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2002.codfw.wmnet with reason: host reimage | [production] | 
            
  | 15:12 | <akosiaris@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 15:09 | <hnowlan@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2002.codfw.wmnet with reason: host reimage | [production] | 
            
  | 15:07 | <arnaudb@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1207.eqiad.wmnet with OS bookworm | [production] | 
            
  | 15:06 | <akosiaris@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 15:05 | <jmm@cumin2002> | START - Cookbook sre.puppet.migrate-host for host db1186.eqiad.wmnet | [production] | 
            
  | 15:05 | <akosiaris@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 14:56 | <akosiaris> | migrate kubemaster1002 to ganeti1037 | [production] | 
            
  | 14:54 | <jmm@cumin2002> | END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1184.eqiad.wmnet | [production] | 
            
  | 14:50 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f5406426910> | [production] | 
            
  | 14:50 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2002 | [production] | 
            
  | 14:49 | <hnowlan@cumin1002> | START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2002 | [production] | 
            
  | 14:49 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2002.codfw.wmnet 223.16.192.10.in-addr.arpa 3.2.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors | [production] | 
            
  | 14:49 | <hnowlan@cumin1002> | START - Cookbook sre.dns.wipe-cache wikikube-worker2002.codfw.wmnet 223.16.192.10.in-addr.arpa 3.2.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors | [production] | 
            
  | 14:49 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 14:49 | <hnowlan@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2002 - hnowlan@cumin1002" | [production] | 
            
  | 14:48 | <hnowlan@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2002 - hnowlan@cumin1002" | [production] | 
            
  | 14:46 | <arnaudb@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:44 | <hnowlan@cumin1002> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 14:44 | <akosiaris> | gnt-instance replace-disks for kubemaster1002, set ganeti1037 as a secondary | [production] | 
            
  | 14:43 | <arnaudb@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on db1207.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:37 | <akosiaris> | reboot kubemaster1001 with 8 vpus for consistency with kubemaster1002. | [production] | 
            
  | 14:37 | <akosiaris> | repool kubemaster1001 with 8 vpus for consistency with kubemaster1002. | [production] | 
            
  | 14:31 | <akosiaris@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 14:30 | <akosiaris> | repool kubemaster1001, testing something | [production] | 
            
  | 14:29 | <akosiaris@cumin1002> | conftool action : set/pooled=yes; selector: service=kubemaster,dc=eqiad,cluster=kubernetes,name=kubemaster1001.eqiad.wmnet | [production] | 
            
  | 14:29 | <akosiaris> | depool kubemaster1001, it's CPU is saturated after a test roll restart | [production] | 
            
  | 14:29 | <arnaudb@cumin1002> | START - Cookbook sre.hosts.reimage for host db1207.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:28 | <arnaudb@cumin1002> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host db1207.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:28 | <akosiaris@cumin1002> | conftool action : set/pooled=no; selector: service=kubemaster,dc=eqiad,cluster=kubernetes,name=kubemaster1001.eqiad.wmnet | [production] | 
            
  | 14:27 | <akosiaris@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 14:26 | <jmm@cumin2002> | START - Cookbook sre.puppet.migrate-host for host db1184.eqiad.wmnet | [production] | 
            
  | 14:25 | <effie> | enabling puppet on wikikube-ctrl100[1-2]* | [production] | 
            
  | 14:24 | <ejegg> | fundraising civicrm upgraded from e2dc8f4e to 7e998894 | [production] | 
            
  | 14:24 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P63444 and previous config saved to /var/cache/conftool/dbconfig/20240528-142431-arnaudb.json | [production] | 
            
  | 14:21 | <akosiaris@cumin1002> | conftool action : set/weight=10; selector: service=kubemaster,dc=eqiad,cluster=kubernetes,name=kubemaster1002.eqiad.wmnet | [production] | 
            
  | 14:19 | <akosiaris> | add another 4 vcpus to kubemaster1002 | [production] | 
            
  | 14:17 | <andrew@cloudcumin1001> | END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) | [admin] | 
            
  | 14:16 | <andrew@cloudcumin1001> | START - Cookbook wmcs.openstack.restart_openstack | [admin] | 
            
  | 14:11 | <akosiaris> | restart kube-apiserver on kubemaster1002 | [production] | 
            
  | 14:09 | <akosiaris@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 14:09 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P63442 and previous config saved to /var/cache/conftool/dbconfig/20240528-140925-arnaudb.json | [production] | 
            
  | 14:08 | <akosiaris@cumin1002> | conftool action : set/weight=1; selector: service=kubemaster,dc=eqiad,cluster=kubernetes,name=kubemaster1002.eqiad.wmnet | [production] | 
            
  | 14:07 | <andrew@cloudcumin1001> | END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) | [admin] | 
            
  | 14:07 | <akosiaris@cumin1002> | conftool action : set/weight=5; selector: service=kubemaster,dc=eqiad,cluster=kubernetes,name=kubemaster1002.eqiad.wmnet | [production] | 
            
  | 14:06 | <andrew@cloudcumin1001> | START - Cookbook wmcs.openstack.restart_openstack | [admin] | 
            
  | 14:04 | <akosiaris> | roll restart mw-api-int pods | [production] | 
            
  | 14:03 | <akosiaris@deploy1002> | helmfile [eqiad] START helmfile.d/services/mw-api-int: sync | [production] | 
            
  | 14:03 | <akosiaris@deploy1002> | helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | [production] |