3401-3450 of 10000 results (99ms)
2024-05-07 ยง
16:48 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet [production]
16:39 <elukey@cumin1002> START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet [production]
16:34 <zabe@deploy1002> Finished scap: T363825 (duration: 07m 42s) [production]
16:26 <zabe@deploy1002> Started scap: T363825 [production]
16:08 <zabe@deploy1002> sync-world aborted: (no justification provided) (duration: 00m 00s) [production]
16:08 <zabe@deploy1002> Started scap: (no justification provided) [production]
16:05 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1028778|Stop writing to old columns of pagelinks in most wikis (T352010 T299947)]] (duration: 32m 29s) [production]
15:58 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db2179 (T352010)', diff saved to https://phabricator.wikimedia.org/P61983 and previous config saved to /var/cache/conftool/dbconfig/20240507-155822-ladsgroup.json [production]
15:58 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance [production]
15:58 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance [production]
15:52 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
15:38 <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1028778|Stop writing to old columns of pagelinks in most wikis (T352010 T299947)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
15:34 <ejegg> switched Adyen IPN format to JSON in merchant console [production]
15:32 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1028778|Stop writing to old columns of pagelinks in most wikis (T352010 T299947)]] [production]
15:31 <ejegg> SmashPig (standalone IPN listener) upgraded from 71b9be53 to 67db9d96 [production]
15:29 <hnowlan> depooling 5 eqiad api appservers in advance of reimaging to k8s workers [production]
15:19 <moritzm> imported nodejs 20.5.1-deb-1nodesource1 to thirdparty/node20 T362681 [production]
15:14 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2122.codfw.wmnet [production]
15:13 <godog> remove accidentally set site!=magru silence, add site=magru silence instead - T364016 [production]
15:12 <elukey> repool ms-fe1009's envoy with PKI TLS cert [production]
15:12 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=ms-fe1009.eqiad.wmnet [production]
14:55 <elukey> depool ms-fe1009's nginx (swift proxy) to safely apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1026927 [production]
14:54 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=ms-fe1009.eqiad.wmnet [production]
14:53 <sukhe> A:cp and A:magru: running haproxy-restart [production]
14:53 <jmm@cumin2002> START - Cookbook sre.puppet.migrate-host for host db2122.codfw.wmnet [production]
14:53 <hnowlan@cumin1002> conftool action : set/weight=10:pooled=yes; selector: name=(mw2305.codfw.wmnet|mw2325.codfw.wmnet|mw2338.codfw.wmnet|mw2359.codfw.wmnet|mw2390.codfw.wmnet|mw2407.codfw.wmnet),cluster=kubernetes,service=kubesvc [production]
14:52 <moritzm> installing mariadb-10.5 security updates (as packaged in Debian, not the wmf-mariadb packages) [production]
14:51 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2121.codfw.wmnet [production]
14:50 <godog> silence site=magru alerts during prometheus7001 - T364016 [production]
14:44 <jmm@cumin2002> START - Cookbook sre.puppet.migrate-host for host db2121.codfw.wmnet [production]
14:41 <hnowlan> running homer 'cr*codfw*' commit to configure BGP for new k8s codfw workers [production]
14:39 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2338.codfw.wmnet with OS bullseye [production]
14:33 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2325.codfw.wmnet with OS bullseye [production]
14:31 <filippo@cumin1002> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus7001.magru.wmnet [production]
14:31 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus7001.magru.wmnet with OS bullseye [production]
14:30 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2305.codfw.wmnet with OS bullseye [production]
14:28 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2359.codfw.wmnet with OS bullseye [production]
14:23 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2407.codfw.wmnet with OS bullseye [production]
14:22 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
14:20 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2390.codfw.wmnet with OS bullseye [production]
14:19 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2338.codfw.wmnet with reason: host reimage [production]
14:16 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus7001.magru.wmnet with reason: host reimage [production]
14:13 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2325.codfw.wmnet with reason: host reimage [production]
14:13 <filippo@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus7001.magru.wmnet with reason: host reimage [production]
14:12 <mfossati@deploy1002> Finished deploy [airflow-dags/platform_eng@b543b85]: (no justification provided) (duration: 00m 24s) [production]
14:11 <mfossati@deploy1002> Started deploy [airflow-dags/platform_eng@b543b85]: (no justification provided) [production]
14:10 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2305.codfw.wmnet with reason: host reimage [production]
14:08 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2359.codfw.wmnet with reason: host reimage [production]
14:04 <hnowlan@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2407.codfw.wmnet with reason: host reimage [production]
14:03 <btullis@deploy1002> Finished deploy [airflow-dags/analytics@6be7efd]: (no justification provided) (duration: 00m 27s) [production]