251-300 of 10000 results (88ms)
2024-07-23 ยง
17:58 <swfrench-wmf> sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949 [production]
17:51 <swfrench-wmf> sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949 [production]
17:46 <logmsgbot> nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s) [production]
17:46 <logmsgbot> nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) [production]
17:44 <swfrench-wmf> sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949 [production]
17:41 <sukhe@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet [production]
17:41 <sukhe@cumin1002> START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet [production]
17:40 <swfrench@cumin2002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949) [production]
17:37 <pt1979@cumin1002> START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
17:33 <swfrench@cumin2002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949) [production]
17:28 <swfrench-wmf> run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949 [production]
17:17 <swfrench-wmf> run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949 [production]
17:06 <swfrench-wmf> ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949 [production]
17:04 <dancy@deploy1002> sync-world aborted: testing (duration: 00m 51s) [production]
17:03 <dancy@deploy1002> Started scap sync-world: testing [production]
17:02 <pt1979@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
16:59 <jhathaway> applying varnish change on cp4037, 1030591 [production]
16:58 <hnowlan@deploy1002> helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply [production]
16:57 <hnowlan@deploy1002> helmfile [eqiad] START helmfile.d/services/shellbox-video: apply [production]
16:16 <pt1979@cumin1002> START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
16:14 <pt1979@cumin1002> END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet [production]
16:07 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply [production]
16:07 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply [production]
15:52 <pt1979@cumin1002> START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet [production]
15:48 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply [production]
15:47 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
15:47 <brouberol@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
15:24 <cgoubert@cumin1002> conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998] [production]
15:24 <Emperor> moss-be1003 out of maintenance mode after network downtime T365998 [production]
15:22 <cgoubert@cumin1002> conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc [production]
15:22 <claime> Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998 [production]
15:20 <andrewbogott> find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense [production]
15:07 <brennen@deploy1002> Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s) [production]
15:06 <brennen@deploy1002> Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 [production]
15:06 <brennen@deploy1002> Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s) [production]
15:05 <brennen@deploy1002> Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) [production]
15:05 <brennen@deploy1002> Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s) [production]
15:03 <brennen@deploy1002> Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 [production]
15:03 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update [production]
15:03 <jelto@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update [production]
15:03 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update [production]
15:02 <jelto@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update [production]
15:02 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update [production]
15:02 <jelto@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update [production]
15:01 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad [production]
15:01 <cmooney@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad [production]
15:01 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad [production]
15:00 <cmooney@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad [production]
15:00 <topranks> rebooting lsw1-f3-eqiad to complete JunOS upgrade (T365998) [production]
14:59 <XioNoX> deploy CR1055546 border-in: remove authdns filter [production]