2024-07-23
ยง
|
18:42 |
<mutante> |
puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949 |
[production] |
18:40 |
<pt1979@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
18:14 |
<dduvall@deploy1002> |
rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960 |
[production] |
18:13 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949 |
[production] |
18:12 |
<aokoth@cumin1002> |
END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet |
[production] |
18:11 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949 |
[production] |
18:11 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa |
[production] |
18:10 |
<aokoth@cumin1002> |
START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet |
[production] |
18:10 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa |
[production] |
18:08 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa |
[production] |
18:01 |
<aokoth@cumin1002> |
END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet |
[production] |
18:01 |
<aokoth@cumin1002> |
START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet |
[production] |
17:58 |
<swfrench-wmf> |
sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949 |
[production] |
17:51 |
<swfrench-wmf> |
sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949 |
[production] |
17:46 |
<logmsgbot> |
nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s) |
[production] |
17:46 |
<logmsgbot> |
nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) |
[production] |
17:44 |
<swfrench-wmf> |
sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949 |
[production] |
17:41 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet |
[production] |
17:41 |
<sukhe@cumin1002> |
START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet |
[production] |
17:40 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949) |
[production] |
17:37 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
17:33 |
<swfrench@cumin2002> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949) |
[production] |
17:28 |
<swfrench-wmf> |
run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949 |
[production] |
17:17 |
<swfrench-wmf> |
run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949 |
[production] |
17:06 |
<swfrench-wmf> |
ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949 |
[production] |
17:04 |
<dancy@deploy1002> |
sync-world aborted: testing (duration: 00m 51s) |
[production] |
17:03 |
<dancy@deploy1002> |
Started scap sync-world: testing |
[production] |
17:02 |
<pt1979@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
16:59 |
<jhathaway> |
applying varnish change on cp4037, 1030591 |
[production] |
16:58 |
<hnowlan@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply |
[production] |
16:57 |
<hnowlan@deploy1002> |
helmfile [eqiad] START helmfile.d/services/shellbox-video: apply |
[production] |
16:16 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye |
[production] |
16:14 |
<pt1979@cumin1002> |
END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet |
[production] |
16:07 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
16:07 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
15:52 |
<pt1979@cumin1002> |
START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet |
[production] |
15:48 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply |
[production] |
15:47 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
15:47 |
<brouberol@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
15:24 |
<cgoubert@cumin1002> |
conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998] |
[production] |
15:24 |
<Emperor> |
moss-be1003 out of maintenance mode after network downtime T365998 |
[production] |
15:22 |
<cgoubert@cumin1002> |
conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc |
[production] |
15:22 |
<claime> |
Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998 |
[production] |
15:20 |
<andrewbogott> |
find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense |
[production] |
15:07 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s) |
[production] |
15:06 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 |
[production] |
15:06 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s) |
[production] |
15:05 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) |
[production] |
15:05 |
<brennen@deploy1002> |
Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s) |
[production] |
15:03 |
<brennen@deploy1002> |
Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 |
[production] |