51-100 of 10000 results (83ms)
2024-07-23 ยง
18:42 <mutante> puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949 [production]
18:40 <pt1979@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
18:14 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960 [production]
18:13 <swfrench-wmf> sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949 [production]
18:12 <aokoth@cumin1002> END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet [production]
18:11 <swfrench-wmf> sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949 [production]
18:11 <swfrench-wmf> sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa [production]
18:10 <aokoth@cumin1002> START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet [production]
18:10 <swfrench-wmf> sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa [production]
18:08 <swfrench-wmf> sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa [production]
18:01 <aokoth@cumin1002> END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet [production]
18:01 <aokoth@cumin1002> START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet [production]
17:58 <swfrench-wmf> sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949 [production]
17:51 <swfrench-wmf> sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949 [production]
17:46 <logmsgbot> nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s) [production]
17:46 <logmsgbot> nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) [production]
17:44 <swfrench-wmf> sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949 [production]
17:41 <sukhe@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet [production]
17:41 <sukhe@cumin1002> START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet [production]
17:40 <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949) [production]
17:37 <pt1979@cumin1002> START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
17:33 <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949) [production]
17:28 <swfrench-wmf> run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949 [production]
17:17 <swfrench-wmf> run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949 [production]
17:06 <swfrench-wmf> ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949 [production]
17:04 <dancy@deploy1002> sync-world aborted: testing (duration: 00m 51s) [production]
17:03 <dancy@deploy1002> Started scap sync-world: testing [production]
17:02 <pt1979@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
16:59 <jhathaway> applying varnish change on cp4037, 1030591 [production]
16:58 <hnowlan@deploy1002> helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply [production]
16:57 <hnowlan@deploy1002> helmfile [eqiad] START helmfile.d/services/shellbox-video: apply [production]
16:16 <pt1979@cumin1002> START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
16:14 <pt1979@cumin1002> END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet [production]
16:07 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply [production]
16:07 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply [production]
15:52 <pt1979@cumin1002> START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet [production]
15:48 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply [production]
15:47 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
15:47 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
15:24 <cgoubert@cumin1002> conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998] [production]
15:24 <Emperor> moss-be1003 out of maintenance mode after network downtime T365998 [production]
15:22 <cgoubert@cumin1002> conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc [production]
15:22 <claime> Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998 [production]
15:20 <andrewbogott> find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense [production]
15:07 <brennen@deploy1002> Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s) [production]
15:06 <brennen@deploy1002> Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 [production]
15:06 <brennen@deploy1002> Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s) [production]
15:05 <brennen@deploy1002> Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) [production]
15:05 <brennen@deploy1002> Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s) [production]
15:03 <brennen@deploy1002> Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 [production]