101-150 of 10000 results (40ms)
2021-09-23 ยง
21:26 <ryankemper> T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook) [production]
21:24 <ryankemper@cumin1001> START - Cookbook sre.dns.netbox [production]
21:23 <ryankemper> T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update` [production]
21:19 <ryankemper> The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing... [production]
21:05 <ryankemper> T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` [production]
21:04 <ryankemper> T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`... [production]
21:04 <ryankemper> T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` [production]
21:00 <ryankemper> T280001 Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding [production]
21:00 <ryankemper> T280001 Sanity check of `sudo ipvsadm -L -n` on backup `lvs2010` and `lvs1016` looks good, proceeding [production]
21:00 <ryankemper> T280001 `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected [production]
20:58 <brennen> canceling backport training window for 2021-09-23 [production]
20:54 <ryankemper> T280001 Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` [production]
20:53 <ryankemper> T280001 Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`... [production]
20:53 <ryankemper> T280001 Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` [production]
20:47 <ryankemper> T280001 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly [production]
20:04 <dduvall> 1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates (T281165) [production]
20:02 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:58 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:50 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1 [production]
19:40 <kostajh> UTC morning backport window done [production]
19:40 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:39 <kharlan@deploy1002> Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:723194|Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020)]] (duration: 01m 05s) [production]
19:36 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:13 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:10 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
19:02 <krinkle@deploy1002> Synchronized wmf-config/InitialiseSettings.php: I3323ce3d4446a2 (duration: 01m 07s) [production]
18:58 <ryankemper> T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721089 to see if it resolves the `confd` error that popped up [production]
18:57 <krinkle@deploy1002> Synchronized wmf-config/logging.php: I2cd81a5165ea14c (duration: 01m 05s) [production]
18:51 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
18:48 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
18:02 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:56 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:31 <volans@cumin2002> END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet [production]
17:30 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:28 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:22 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:19 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:18 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:13 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
17:06 <volans@cumin2002> START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet [production]
17:01 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:59 <volans> uploaded spicerack_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [production]
16:55 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:38 <ryankemper> T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/713959, running puppet on `*w*qs*` (i.e. wcqs and wdqs) [production]
16:13 <elukey> reboot an-worker1096 to see if megacli status for a new disk changes - T290805 [production]
16:09 <brennen> gitlab1001: reverting [[gerrit:714382|gitlab cas: uid instead of CN; add nickname_key]] for T288392, as existing user logins are broken. [production]
15:54 <Lucas_WMDE> lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder/' | mwscript purgeList.php # T285761 [production]
15:54 <brennen> gitlab1001: brief downtime to apply [[gerrit:714382|gitlab cas: uid instead of CN; add nickname_key]] for T288392 [production]
15:12 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
15:09 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]