| 2021-09-23
      
      ยง | 
    
  | 22:58 | <foks> | creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668 | [production] | 
            
  | 22:37 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 22:34 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 22:33 | <reedy@deploy1002> | Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s) | [production] | 
            
  | 22:27 | <ryankemper> | T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck | [production] | 
            
  | 22:27 | <ryankemper> | T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*` | [production] | 
            
  | 22:20 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 22:18 | <ryankemper> | T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10` | [production] | 
            
  | 22:17 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=yes:weight=10; selector: name=wcqs.* | [production] | 
            
  | 22:17 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 22:13 | <ryankemper> | T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` | [production] | 
            
  | 22:13 | <ryankemper> | T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` | [production] | 
            
  | 22:06 | <ryankemper> | T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` | [production] | 
            
  | 22:06 | <ryankemper> | T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` | [production] | 
            
  | 22:05 | <ryankemper> | T280001 [Cleanup required] `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous) | [production] | 
            
  | 22:05 | <ryankemper> | T280001 [Sanity check] `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected | [production] | 
            
  | 22:04 | <ryankemper> | T280001 Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` | [production] | 
            
  | 22:03 | <ryankemper> | T280001 Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`... | [production] | 
            
  | 22:03 | <ryankemper> | T280001 Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` | [production] | 
            
  | 22:00 | <ryankemper> | T280001 Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`... | [production] | 
            
  | 21:59 | <ryankemper> | T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs` | [production] | 
            
  | 21:59 | <ryankemper> | T280001 Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well | [production] | 
            
  | 21:57 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 21:53 | <ryankemper@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 21:43 | <bd808@deploy1002> | helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . | [production] | 
            
  | 21:43 | <foks> | altering some rows in the `securepoll_elections` table on metawiki | [production] | 
            
  | 21:36 | <ryankemper> | T280001 `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved | [production] | 
            
  | 21:33 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 21:26 | <ryankemper> | T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook) | [production] | 
            
  | 21:24 | <ryankemper@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 21:23 | <ryankemper> | T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update` | [production] | 
            
  | 21:19 | <ryankemper> | The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing... | [production] | 
            
  | 21:05 | <ryankemper> | T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` | [production] | 
            
  | 21:04 | <ryankemper> | T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`... | [production] | 
            
  | 21:04 | <ryankemper> | T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` | [production] | 
            
  | 21:00 | <ryankemper> | T280001 Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding | [production] | 
            
  | 21:00 | <ryankemper> | T280001 Sanity check of `sudo ipvsadm -L -n` on backup  `lvs2010` and `lvs1016` looks good, proceeding | [production] | 
            
  | 21:00 | <ryankemper> | T280001 `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected | [production] | 
            
  | 20:58 | <brennen> | canceling backport training window for 2021-09-23 | [production] | 
            
  | 20:54 | <ryankemper> | T280001 Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` | [production] | 
            
  | 20:53 | <ryankemper> | T280001 Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`... | [production] | 
            
  | 20:53 | <ryankemper> | T280001 Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` | [production] | 
            
  | 20:47 | <ryankemper> | T280001 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly | [production] | 
            
  | 20:04 | <dduvall> | 1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates (T281165) | [production] | 
            
  | 20:02 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 19:58 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 19:50 | <dduvall@deploy1002> | rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1 | [production] | 
            
  | 19:40 | <kostajh> | UTC morning backport window done | [production] | 
            
  | 19:40 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 19:39 | <kharlan@deploy1002> | Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:723194|Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020)]] (duration: 01m 05s) | [production] |