| 2021-12-14
      
      ยง | 
    
  | 17:23 | <cmjohnson@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 17:23 | <mutante> | elastic1043 is down and alerting since > 6h | [production] | 
            
  | 17:21 | <mutante> | icinga - re-enabling active monitoring checks on mx2001 (T297128) | [production] | 
            
  | 17:18 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host aqs1012.eqiad.wmnet | [production] | 
            
  | 17:15 | <cmjohnson@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 17:15 | <oblivian@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 17:12 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1011.eqiad.wmnet | [production] | 
            
  | 17:10 | <cmjohnson@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 17:06 | <hnowlan@puppetmaster1001> | conftool action : set/weight=10:pooled=yes; selector: name=restbase2026.codfw.wmnet | [production] | 
            
  | 16:57 | <cmjohnson@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 16:57 | <jhathaway@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:51 | <cmjohnson@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 16:43 | <Amir1> | rolling restart of php-fpm on all mediawiki hosts (T297517 T297667) | [production] | 
            
  | 16:33 | <jhathaway@cumin1001> | START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:30 | <jhathaway@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:30 | <jhathaway@cumin1001> | START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:24 | <jhathaway@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:24 | <jhathaway@cumin1001> | START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:21 | <jhathaway@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:21 | <jhathaway@cumin1001> | START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye | [production] | 
            
  | 16:20 | <accraze@deploy1002> | helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . | [production] | 
            
  | 16:00 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host aqs1011.eqiad.wmnet | [production] | 
            
  | 15:59 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1010.eqiad.wmnet | [production] | 
            
  | 15:54 | <ladsgroup@deploy1002> | Synchronized php-1.38.0-wmf.12/includes/cache/LinkCache.php: Backport: [[gerrit:747073|cache: Add four fields to LinkCache::getSelectFields (T297669)]] (duration: 00m 57s) | [production] | 
            
  | 15:53 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host aqs1010.eqiad.wmnet | [production] | 
            
  | 15:51 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 15:50 | <moritzm> | drain primary/secondary instances off ganeti2023 T296622 | [production] | 
            
  | 15:50 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 15:49 | <btullis@cumin1001> | END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host aqs1010.eqiad.wmnet | [production] | 
            
  | 15:49 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host aqs1010.eqiad.wmnet | [production] | 
            
  | 15:48 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2018.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage | [production] | 
            
  | 15:47 | <jmm@cumin2002> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2018.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage | [production] | 
            
  | 15:42 | <oblivian@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 15:21 | <hashar@deploy1002> | Finished scap: Push wmf.13 without promoting any wikis (duration: 29m 31s) | [production] | 
            
  | 15:15 | <bblack@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS buster | [production] | 
            
  | 15:13 | <bblack@cumin1001> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1018.eqiad.wmnet with OS buster | [production] | 
            
  | 15:09 | <vgutierrez@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp4025.ulsfo.wmnet with OS buster | [production] | 
            
  | 14:52 | <hashar@deploy1002> | Started scap: Push wmf.13 without promoting any wikis | [production] | 
            
  | 14:49 | <bblack@cumin1001> | START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS buster | [production] | 
            
  | 14:47 | <bblack@cumin1001> | START - Cookbook sre.hosts.reimage for host lvs1018.eqiad.wmnet with OS buster | [production] | 
            
  | 14:16 | <ladsgroup@deploy1002> | Synchronized php-1.38.0-wmf.12/includes/OutputPage.php: Backport: [[gerrit:747068|Reuse the query result in addCategoryLinks instead of relying on cache (T297669)]] (duration: 00m 57s) | [production] | 
            
  | 14:13 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 14:12 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 14:02 | <vgutierrez@cumin1001> | START - Cookbook sre.hosts.reimage for host cp4025.ulsfo.wmnet with OS buster | [production] | 
            
  | 14:01 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 13:59 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 13:56 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18229 and previous config saved to /var/cache/conftool/dbconfig/20211214-135601-marostegui.json | [production] | 
            
  | 13:55 | <Lucas_WMDE> | Deployed patch for T297570 | [production] | 
            
  | 13:51 | <vgutierrez> | depool cp4025 to be reimaged as cache::upload_envoy - T271421 | [production] | 
            
  | 13:40 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18228 and previous config saved to /var/cache/conftool/dbconfig/20211214-134056-marostegui.json | [production] |