101-150 of 10000 results (89ms)
2025-05-01 ยง
20:48 <ryankemper@dns1004> END - running authdns-update [production]
20:46 <ryankemper@dns1004> START - running authdns-update [production]
20:45 <jhuneidi@deploy1003> Finished scap sync-world: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] (duration: 30m 35s) [production]
20:40 <sukhe> restart pybal on lvs1020 [production]
20:35 <jhuneidi@deploy1003> bvibber, jhuneidi: Continuing with sync [production]
20:33 <jhuneidi@deploy1003> bvibber, jhuneidi: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
20:32 <sukhe> sudo cumin 'O:config_master' 'run-puppet-agent' [production]
20:14 <jhuneidi@deploy1003> Started scap sync-world: Backport for [[gerrit:1140229|Check for content validity before extracting license (T389125)]], [[gerrit:1140228|Fix localization for validation errors checking tabular data (T389126)]] [production]
19:37 <sukhe> no pending Netbox changes [production]
19:37 <sukhe@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
19:34 <sukhe> [correction] running sre.dns.netbox to ensure no pending changes (NOT in dry-run) [production]
19:34 <sukhe> running sre.dns.netbox to ensure no pending changes [production]
19:34 <sukhe@cumin1002> START - Cookbook sre.dns.netbox [production]
19:33 <dduvall> re-ran scap sync to fix mw-jobrunner codfw deployments following failed helmfile apply and verified correct image ref manually (T386222) [production]
19:30 <dduvall@deploy1003> Finished scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) (duration: 11m 24s) [production]
19:20 <sukhe> sukhe@netbox1003:~$ sudo systemctl start uwsgi-netbox.service: service was OOM'ed, restarting [production]
19:18 <dduvall@deploy1003> Started scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) [production]
19:16 <jhathaway@dns1004> END - running authdns-update [production]
19:14 <jhathaway@dns1004> START - running authdns-update [production]
19:09 <ryankemper> T376151 [wdqs-internal lvs teardown] running puppet across `A:wdqs-internal` now that pybal has been restarted [production]
19:09 <dduvall> deployment of mw-jobrunner-main for codfw failed during scap train (group2) (T386222) [production]
19:09 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] all IPVS diff check alerts have recovered, rolling restart complete [production]
19:06 <dduvall> helm error during group2 deployment "Get "https://kubemaster.svc.codfw.wmnet:6443/api/v1/namespaces/mw-jobrunner/services/mediawiki-main-tls-service": dial tcp 10.2.1.8:6443: connect: no route to host - error from a previous attempt: read tcp 10.64.16.93:41894->10.2.1.8:6443: read: connection reset by peer" [production]
19:04 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.2.41:80` on `lvs1019` and `lvs1020` [production]
19:03 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.1.41:80` on `A:lvs-secondary-codfw OR A:lvs-low-traffic-codfw`(lvs2013, lvs2014) [production]
18:59 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-codfw` (lvs2013) [production]
18:58 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-codfw` (lvs2014), waiting 2 mins before proceeding [production]
18:55 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-eqiad` (lvs1019), waiting few mins before proceeding [production]
18:48 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-eqiad`, it only restarted on ` lvs1020` but for some reason ` lvs1013` doesn't have a pybal service running [production]
18:44 <ryankemper> T376151 [wdqs-internal lvs teardown -> pybal rolling restart] ran puppet on `O:Lvs::balancer` after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136747 [production]
18:32 <eevans@deploy1003> helmfile [eqiad] DONE helmfile.d/services/echostore: apply [production]
18:31 <eevans@deploy1003> helmfile [eqiad] START helmfile.d/services/echostore: apply [production]
18:30 <eevans@deploy1003> helmfile [codfw] DONE helmfile.d/services/echostore: apply [production]
18:29 <eevans@deploy1003> helmfile [codfw] START helmfile.d/services/echostore: apply [production]
18:28 <eevans@deploy1003> helmfile [staging] DONE helmfile.d/services/echostore: apply [production]
18:27 <eevans@deploy1003> helmfile [staging] START helmfile.d/services/echostore: apply [production]
18:26 <ryankemper> T376151 (wdqs-internal lvs teardown) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136744 to flip `wdqs-internal` service state to `lvs_setup` and running puppet across `A:dnsbox` [production]
18:24 <dduvall@deploy1003> rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.27 refs T386222 [production]
18:23 <ryankemper@dns1004> END - running authdns-update [production]
18:21 <ryankemper@dns1004> START - running authdns-update [production]
17:31 <jhathaway> testing sasl email relaying on mx-in{1001,2001} [production]
16:40 <btullis@deploy1003> helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply [production]
16:40 <btullis@deploy1003> helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply [production]
16:39 <btullis@deploy1003> helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply [production]
16:38 <btullis@deploy1003> helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply [production]
16:04 <jhancock@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:02 <jhancock@cumin2002> START - Cookbook sre.dns.netbox [production]
16:01 <jhancock@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2045.codfw.wmnet with OS bookworm [production]
16:01 <jhancock@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [production]
15:58 <jhancock@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [production]