| 2025-01-26
      
      § | 
    
  | 20:30 | <marostegui@cumin1002> | DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Index rebuild + upgrade | [production] | 
            
  | 17:43 | <andrew@cumin1002> | START - Cookbook sre.hosts.reimage for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 17:43 | <andrew@cumin1002> | END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:43 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:37 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:18 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:18 | <andrew@cumin1002> | END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:18 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 17:18 | <andrew@cumin1002> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 16:44 | <andrew@cumin1002> | START - Cookbook sre.hosts.reimage for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 16:44 | <andrew@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 15:28 | <andrew@cumin1002> | START - Cookbook sre.hosts.reimage for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 15:23 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:23 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:23 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:23 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:22 | <andrew@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1013.eqiad.wmnet with OS bullseye | [production] | 
            
  | 15:22 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:22 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:22 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:22 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:21 | <andrew@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 15:21 | <andrew@cumin1002> | START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1013.eqiad.wmnet'] | [production] | 
            
  | 08:38 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud | [tools] | 
            
  | 08:37 | <taavi@cloudcumin1001> | START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud | [tools] | 
            
  | 08:37 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster | [tools] | 
            
  | 08:37 | <taavi@cloudcumin1001> | Added a new k8s worker-nfs tools-k8s-worker-nfs-79.tools.eqiad1.wikimedia.cloud to the cluster | [tools] | 
            
  | 08:27 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster (T384790) | [tools] | 
            
  | 08:26 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster | [tools] | 
            
  | 08:26 | <taavi@cloudcumin1001> | Added a new k8s worker-nfs tools-k8s-worker-nfs-78.tools.eqiad1.wikimedia.cloud to the cluster | [tools] | 
            
  | 08:16 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster (T384790) | [tools] | 
            
  | 08:16 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster | [tools] | 
            
  | 08:16 | <taavi@cloudcumin1001> | Added a new k8s worker-nfs tools-k8s-worker-nfs-77.tools.eqiad1.wikimedia.cloud to the cluster | [tools] | 
            
  | 08:06 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster (T384790) | [tools] | 
            
  | 08:06 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster | [tools] | 
            
  | 08:06 | <taavi@cloudcumin1001> | Added a new k8s worker tools-k8s-worker-110.tools.eqiad1.wikimedia.cloud to the cluster | [tools] | 
            
  | 07:56 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster (T384790) | [tools] | 
            
  | 07:56 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster | [tools] | 
            
  | 07:56 | <taavi@cloudcumin1001> | Added a new k8s worker tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud to the cluster | [tools] | 
            
  | 07:44 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster (T384790) | [tools] | 
            
  | 07:38 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 | [tools] | 
            
  | 07:32 | <taavi@cloudcumin1001> | START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 | [tools] | 
            
  
    | 2025-01-25
      
      § | 
    
  | 21:43 | <wmbot~bd808@tools-bastion-12> | `toolforge jobs restart ircservserv` -- bot was absent from expected channels | [tools.ircservserv] | 
            
  | 18:20 | <taavi@cloudcumin1001> | END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'yochayco' in role 'reader' | [bastion] | 
            
  | 18:20 | <taavi@cloudcumin1001> | START - Cookbook wmcs.vps.add_user_to_project for user 'yochayco' in role 'reader' | [bastion] | 
            
  | 16:30 | <cmooney@cumin1002> | DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1002.eqiad.wmnet with reason: disabling gnmic in systemd | [production] | 
            
  | 15:36 | <cmooney@cumin1002> | DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on netflow1002.eqiad.wmnet with reason: disabling gnmic in systemd | [production] | 
            
  | 11:51 | <godog> | bounce thanos-query on titan100* | [production] | 
            
  | 11:14 | <cmooney@cumin1002> | DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cr1-magru,cr[1-2]-magru IPv6 with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002 | [production] | 
            
  | 11:02 | <topranks> | bouncing IBGP session from cr1-magru to cr2-magru manually to reset | [production] |