2025-05-01
ยง
|
22:27 |
<thcipriani> |
mwscript-k8s -- resetAuthenticationThrottle.pp --wiki=aawiki --signup --ip=<istanbul ips> (x17) |
[production] |
22:09 |
<dzahn@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1140543|Add another throttle rule for Istanbul Hackathon 2025 (T382309)]] (duration: 14m 32s) |
[production] |
22:02 |
<dzahn@deploy1003> |
dzahn: Continuing with sync |
[production] |
22:00 |
<dzahn@deploy1003> |
dzahn: Backport for [[gerrit:1140543|Add another throttle rule for Istanbul Hackathon 2025 (T382309)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
21:54 |
<dzahn@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1140543|Add another throttle rule for Istanbul Hackathon 2025 (T382309)]] |
[production] |
21:40 |
<dzahn@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1140539|Add throttle rule for Istanbul Hackathon 2025 (T382309)]] (duration: 25m 16s) |
[production] |
21:34 |
<dzahn@deploy1003> |
dzahn: Continuing with sync |
[production] |
21:20 |
<dzahn@deploy1003> |
dzahn: Backport for [[gerrit:1140539|Add throttle rule for Istanbul Hackathon 2025 (T382309)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
21:15 |
<dzahn@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1140539|Add throttle rule for Istanbul Hackathon 2025 (T382309)]] |
[production] |
21:03 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] Declaring this officially done. No more irc log spam from me today :) |
[production] |
21:01 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:01 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002" |
[production] |
21:01 |
<ryankemper@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002" |
[production] |
21:01 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/` |
[production] |
21:01 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/` |
[production] |
20:54 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] `sudo rm -fv /srv/config-master/pybal/eqiad/wdqs-internal && sudo rm -fv /srv/config-master/pybal/codfw/wdqs-internal` on `config-master[1,2]001` |
[production] |
20:53 |
<ryankemper@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
20:50 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] Surrendered `10.2.2.41/32` (eqiad wdqs-internal vip) and `10.2.1.41/32` (codfw wdqs-internal vip) from netbox interface |
[production] |
20:48 |
<ryankemper@dns1004> |
END - running authdns-update |
[production] |
20:46 |
<ryankemper@dns1004> |
START - running authdns-update |
[production] |
20:45 |
<jhuneidi@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] (duration: 30m 35s) |
[production] |
20:40 |
<sukhe> |
restart pybal on lvs1020 |
[production] |
20:35 |
<jhuneidi@deploy1003> |
bvibber, jhuneidi: Continuing with sync |
[production] |
20:33 |
<jhuneidi@deploy1003> |
bvibber, jhuneidi: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
20:32 |
<sukhe> |
sudo cumin 'O:config_master' 'run-puppet-agent' |
[production] |
20:14 |
<jhuneidi@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] |
[production] |
19:37 |
<sukhe> |
no pending Netbox changes |
[production] |
19:37 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
19:34 |
<sukhe> |
[correction] running sre.dns.netbox to ensure no pending changes (NOT in dry-run) |
[production] |
19:34 |
<sukhe> |
running sre.dns.netbox to ensure no pending changes |
[production] |
19:34 |
<sukhe@cumin1002> |
START - Cookbook sre.dns.netbox |
[production] |
19:33 |
<dduvall> |
re-ran scap sync to fix mw-jobrunner codfw deployments following failed helmfile apply and verified correct image ref manually (T386222) |
[production] |
19:30 |
<dduvall@deploy1003> |
Finished scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) (duration: 11m 24s) |
[production] |
19:20 |
<sukhe> |
sukhe@netbox1003:~$ sudo systemctl start uwsgi-netbox.service: service was OOM'ed, restarting |
[production] |
19:18 |
<dduvall@deploy1003> |
Started scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) |
[production] |
19:16 |
<jhathaway@dns1004> |
END - running authdns-update |
[production] |
19:14 |
<jhathaway@dns1004> |
START - running authdns-update |
[production] |
19:09 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown] running puppet across `A:wdqs-internal` now that pybal has been restarted |
[production] |
19:09 |
<dduvall> |
deployment of mw-jobrunner-main for codfw failed during scap train (group2) (T386222) |
[production] |
19:09 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] all IPVS diff check alerts have recovered, rolling restart complete |
[production] |
19:06 |
<dduvall> |
helm error during group2 deployment "Get "https://kubemaster.svc.codfw.wmnet:6443/api/v1/namespaces/mw-jobrunner/services/mediawiki-main-tls-service": dial tcp 10.2.1.8:6443: connect: no route to host - error from a previous attempt: read tcp 10.64.16.93:41894->10.2.1.8:6443: read: connection reset by peer" |
[production] |
19:04 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.2.41:80` on `lvs1019` and `lvs1020` |
[production] |
19:03 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.1.41:80` on `A:lvs-secondary-codfw OR A:lvs-low-traffic-codfw`(lvs2013, lvs2014) |
[production] |
18:59 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-codfw` (lvs2013) |
[production] |
18:58 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-codfw` (lvs2014), waiting 2 mins before proceeding |
[production] |
18:55 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-eqiad` (lvs1019), waiting few mins before proceeding |
[production] |
18:48 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-eqiad`, it only restarted on ` lvs1020` but for some reason ` lvs1013` doesn't have a pybal service running |
[production] |
18:44 |
<ryankemper> |
T376151 [wdqs-internal lvs teardown -> pybal rolling restart] ran puppet on `O:Lvs::balancer` after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136747 |
[production] |
18:32 |
<eevans@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/echostore: apply |
[production] |
18:31 |
<eevans@deploy1003> |
helmfile [eqiad] START helmfile.d/services/echostore: apply |
[production] |