2023-06-27
§
|
08:52 |
<kartik@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply |
[production] |
08:47 |
<kartik@deploy1002> |
helmfile [codfw] START helmfile.d/services/machinetranslation: apply |
[production] |
08:45 |
<kartik@deploy1002> |
helmfile [staging] DONE helmfile.d/services/machinetranslation: apply |
[production] |
08:42 |
<kartik@deploy1002> |
helmfile [staging] START helmfile.d/services/machinetranslation: apply |
[production] |
08:42 |
<fabfur@cumin1001> |
START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo |
[production] |
08:41 |
<fabfur@cumin1001> |
START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo |
[production] |
08:41 |
<kart_> |
Updated cxserver to 2023-06-27-053435-production (T339105) |
[production] |
08:38 |
<elukey> |
revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private - T337825 |
[production] |
08:33 |
<root@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 19 hosts |
[production] |
08:33 |
<root@cumin2002> |
START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 19 hosts |
[production] |
08:32 |
<kartik@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/cxserver: apply |
[production] |
08:32 |
<root@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 767 hosts |
[production] |
08:32 |
<kartik@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cxserver: apply |
[production] |
08:32 |
<root@cumin2002> |
START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts |
[production] |
08:31 |
<root@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts |
[production] |
08:30 |
<root@cumin2002> |
START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts |
[production] |
08:29 |
<marostegui> |
Failover m2-master to dbproxy1022 T337812 |
[production] |
08:28 |
<kartik@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/cxserver: apply |
[production] |
08:28 |
<kartik@deploy1002> |
helmfile [codfw] START helmfile.d/services/cxserver: apply |
[production] |
08:25 |
<kartik@deploy1002> |
helmfile [staging] DONE helmfile.d/services/cxserver: apply |
[production] |
08:24 |
<kartik@deploy1002> |
helmfile [staging] START helmfile.d/services/cxserver: apply |
[production] |
08:14 |
<kartik@deploy1002> |
Finished scap: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] (duration: 16m 17s) |
[production] |
08:03 |
<moritzm> |
installing openjdk-8 security updates for bullseye |
[production] |
08:02 |
<kartik@deploy1002> |
kartik: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet |
[production] |
07:58 |
<kartik@deploy1002> |
Started scap: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] |
[production] |
07:54 |
<moritzm> |
uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster) |
[production] |
07:48 |
<hashar> |
Restart Zuul due to stuck connection | T340518 | T309376 |
[production] |
07:15 |
<elukey> |
`sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet |
[production] |
06:22 |
<marostegui> |
Failover m1-master to dbproxy1022 T337812 |
[production] |
2023-06-26
§
|
23:21 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery |
[production] |
23:21 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery |
[production] |
23:07 |
<btullis@deploy1002> |
helmfile [staging] DONE helmfile.d/services/datahub: sync on main |
[production] |
23:02 |
<sbassett> |
Deployed updated mitigation for T336027 |
[production] |
23:01 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) |
[production] |
22:55 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
22:51 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
22:46 |
<btullis@deploy1002> |
helmfile [staging] DONE helmfile.d/services/datahub: sync on main |
[production] |
22:33 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
22:31 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
22:24 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
22:18 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.restart |
[production] |
22:17 |
<ryankemper@cumin1001> |
END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97) |
[production] |
22:17 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.restart |
[production] |
22:17 |
<btullis@deploy1002> |
helmfile [staging] DONE helmfile.d/services/datahub: sync on main |
[production] |
22:16 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) |
[production] |
22:05 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
21:58 |
<eevans@cumin2002> |
END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance |
[production] |
21:57 |
<btullis@deploy1002> |
helmfile [staging] DONE helmfile.d/services/datahub: sync on main |
[production] |
21:55 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.restart |
[production] |
21:54 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) |
[production] |