2021-09-23
ยง
|
22:33 |
<reedy@deploy1002> |
Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s) |
[production] |
22:27 |
<ryankemper> |
T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck |
[production] |
22:27 |
<ryankemper> |
T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*` |
[production] |
22:20 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:18 |
<ryankemper> |
T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10` |
[production] |
22:17 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: name=wcqs.* |
[production] |
22:17 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:13 |
<ryankemper> |
T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` |
[production] |
22:13 |
<ryankemper> |
T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` |
[production] |
22:06 |
<ryankemper> |
T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` |
[production] |
22:06 |
<ryankemper> |
T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` |
[production] |
22:05 |
<ryankemper> |
T280001 [Cleanup required] `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous) |
[production] |
22:05 |
<ryankemper> |
T280001 [Sanity check] `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected |
[production] |
22:04 |
<ryankemper> |
T280001 Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` |
[production] |
22:03 |
<ryankemper> |
T280001 Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`... |
[production] |
22:03 |
<ryankemper> |
T280001 Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` |
[production] |
22:00 |
<ryankemper> |
T280001 Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`... |
[production] |
21:59 |
<ryankemper> |
T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs` |
[production] |
21:59 |
<ryankemper> |
T280001 Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well |
[production] |
21:57 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:53 |
<ryankemper@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
21:43 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
21:43 |
<foks> |
altering some rows in the `securepoll_elections` table on metawiki |
[production] |
21:36 |
<ryankemper> |
T280001 `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved |
[production] |
21:33 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:26 |
<ryankemper> |
T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook) |
[production] |
21:24 |
<ryankemper@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
21:23 |
<ryankemper> |
T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update` |
[production] |
21:19 |
<ryankemper> |
The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing... |
[production] |
21:05 |
<ryankemper> |
T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` |
[production] |
21:04 |
<ryankemper> |
T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`... |
[production] |
21:04 |
<ryankemper> |
T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` |
[production] |
21:00 |
<ryankemper> |
T280001 Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding |
[production] |
21:00 |
<ryankemper> |
T280001 Sanity check of `sudo ipvsadm -L -n` on backup `lvs2010` and `lvs1016` looks good, proceeding |
[production] |
21:00 |
<ryankemper> |
T280001 `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected |
[production] |
20:58 |
<brennen> |
canceling backport training window for 2021-09-23 |
[production] |
20:54 |
<ryankemper> |
T280001 Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` |
[production] |
20:53 |
<ryankemper> |
T280001 Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`... |
[production] |
20:53 |
<ryankemper> |
T280001 Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` |
[production] |
20:47 |
<ryankemper> |
T280001 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly |
[production] |
20:04 |
<dduvall> |
1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates (T281165) |
[production] |
20:02 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:58 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:50 |
<dduvall@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1 |
[production] |
19:40 |
<kostajh> |
UTC morning backport window done |
[production] |
19:40 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:39 |
<kharlan@deploy1002> |
Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:723194|Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020)]] (duration: 01m 05s) |
[production] |
19:36 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:13 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:10 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |