2024-07-23
ยง
|
17:58 |
<swfrench-wmf> |
sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949 |
[production] |
17:51 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949 |
[production] |
17:46 |
<logmsgbot> |
nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s) |
[production] |
17:46 |
<logmsgbot> |
nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) |
[production] |
17:44 |
<swfrench-wmf> |
sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949 |
[production] |
17:41 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet |
[production] |
17:41 |
<sukhe@cumin1002> |
START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet |
[production] |
17:40 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949) |
[production] |
17:37 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
17:33 |
<swfrench@cumin2002> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949) |
[production] |
17:28 |
<swfrench-wmf> |
run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949 |
[production] |
17:17 |
<swfrench-wmf> |
run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949 |
[production] |
17:06 |
<swfrench-wmf> |
ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949 |
[production] |
17:04 |
<dancy@deploy1002> |
sync-world aborted: testing (duration: 00m 51s) |
[production] |
17:03 |
<dancy@deploy1002> |
Started scap sync-world: testing |
[production] |
17:02 |
<pt1979@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
16:59 |
<jhathaway> |
applying varnish change on cp4037, 1030591 |
[production] |
16:58 |
<hnowlan@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply |
[production] |
16:57 |
<hnowlan@deploy1002> |
helmfile [eqiad] START helmfile.d/services/shellbox-video: apply |
[production] |
16:16 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
16:14 |
<pt1979@cumin1002> |
END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet |
[production] |
16:07 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
16:07 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
15:52 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet |
[production] |
15:48 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
15:47 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
15:47 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
15:24 |
<cgoubert@cumin1002> |
conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998] |
[production] |
15:24 |
<Emperor> |
moss-be1003 out of maintenance mode after network downtime T365998 |
[production] |
15:22 |
<cgoubert@cumin1002> |
conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc |
[production] |
15:22 |
<claime> |
Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998 |
[production] |
15:20 |
<andrewbogott> |
find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense |
[production] |
15:07 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s) |
[production] |
15:06 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 |
[production] |
15:06 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s) |
[production] |
15:05 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) |
[production] |
15:05 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s) |
[production] |
15:03 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 |
[production] |
15:03 |
<jelto@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update |
[production] |
15:03 |
<jelto@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update |
[production] |
15:03 |
<jelto@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update |
[production] |
15:02 |
<jelto@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update |
[production] |
15:02 |
<jelto@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update |
[production] |
15:02 |
<jelto@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update |
[production] |
15:01 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad |
[production] |
15:01 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad |
[production] |
15:01 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad |
[production] |
15:00 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad |
[production] |
15:00 |
<topranks> |
rebooting lsw1-f3-eqiad to complete JunOS upgrade (T365998) |
[production] |
14:59 |
<XioNoX> |
deploy CR1055546 border-in: remove authdns filter |
[production] |