951-1000 of 10000 results (28ms)
2021-09-24 §
06:44 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [production]
06:41 <elukey@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [production]
06:30 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [production]
06:26 <elukey> restart archiva on archiva1002 to pick up new openjdk upgrades [production]
06:11 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17324 and previous config saved to /var/cache/conftool/dbconfig/20210924-061105-root.json [production]
05:56 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17323 and previous config saved to /var/cache/conftool/dbconfig/20210924-055601-root.json [production]
05:40 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17322 and previous config saved to /var/cache/conftool/dbconfig/20210924-054057-root.json [production]
05:25 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17321 and previous config saved to /var/cache/conftool/dbconfig/20210924-052554-root.json [production]
05:10 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17320 and previous config saved to /var/cache/conftool/dbconfig/20210924-051050-root.json [production]
05:07 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1177 T291584', diff saved to https://phabricator.wikimedia.org/P17319 and previous config saved to /var/cache/conftool/dbconfig/20210924-050739-marostegui.json [production]
01:27 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
01:23 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
01:16 <krinkle@deploy1002> Synchronized wmf-config/profiler.php: I25f4b70b9d4b (duration: 00m 57s) [production]
00:39 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
00:39 <legoktm@deploy1002> Synchronized php-1.38.0-wmf.1/resources/src/mediawiki.searchSuggest/searchSuggest.js: Hiding fallback button depends on HTML order (T291272) (duration: 00m 57s) [production]
00:36 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
2021-09-23 §
23:38 <foks> running wm-scripts/mcdc2021/populateEditCount.php on each wiki (s1 thru s8 simultaneously) https://phabricator.wikimedia.org/T291668 [production]
22:58 <bd808@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [production]
22:58 <foks> creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668 [production]
22:37 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:34 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:33 <reedy@deploy1002> Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s) [production]
22:27 <ryankemper> T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck [production]
22:27 <ryankemper> T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*` [production]
22:20 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:18 <ryankemper> T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10` [production]
22:17 <ryankemper@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: name=wcqs.* [production]
22:17 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:13 <ryankemper> T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` [production]
22:13 <ryankemper> T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` [production]
22:06 <ryankemper> T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` [production]
22:06 <ryankemper> T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` [production]
22:05 <ryankemper> T280001 [Cleanup required] `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous) [production]
22:05 <ryankemper> T280001 [Sanity check] `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected [production]
22:04 <ryankemper> T280001 Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'` [production]
22:03 <ryankemper> T280001 Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`... [production]
22:03 <ryankemper> T280001 Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'` [production]
22:00 <ryankemper> T280001 Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`... [production]
21:59 <ryankemper> T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs` [production]
21:59 <ryankemper> T280001 Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well [production]
21:57 <ryankemper@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:53 <ryankemper@cumin1001> START - Cookbook sre.dns.netbox [production]
21:43 <bd808@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . [production]
21:43 <foks> altering some rows in the `securepoll_elections` table on metawiki [production]
21:36 <ryankemper> T280001 `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved [production]
21:33 <ryankemper@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:26 <ryankemper> T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook) [production]
21:24 <ryankemper@cumin1001> START - Cookbook sre.dns.netbox [production]
21:23 <ryankemper> T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update` [production]
21:19 <ryankemper> The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing... [production]