2021-08-10
§
|
08:58 |
<ariel@deploy1002> |
Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files |
[production] |
08:49 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:20 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
08:20 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
08:19 |
<jayme@deploy1002> |
helmfile [codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:18 |
<jayme@deploy1002> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
08:16 |
<jayme@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:16 |
<jayme@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:14 |
<jayme@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
08:06 |
<godog> |
upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - T288326 |
[production] |
08:03 |
<moritzm> |
installing openjdk-8 security updates on stretch |
[production] |
07:33 |
<moritzm> |
installing lynx security updates |
[production] |
05:56 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json |
[production] |
05:41 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json |
[production] |
05:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json |
[production] |
05:11 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json |
[production] |
05:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped T287454', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json |
[production] |
05:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json |
[production] |
05:00 |
<marostegui> |
Starting s2 codfw failover from db2107 to db2104 - T287454 |
[production] |
04:23 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454 |
[production] |
04:23 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454 |
[production] |
04:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set db2104 with weight 0 T287454', diff saved to https://phabricator.wikimedia.org/P16981 and previous config saved to /var/cache/conftool/dbconfig/20210810-041627-root.json |
[production] |
02:35 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
02:33 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
02:07 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
02:06 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
2021-08-09
§
|
16:12 |
<legoktm@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . |
[production] |
16:10 |
<jayme@deploy1002> |
helmfile [codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:09 |
<jayme@deploy1002> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
16:07 |
<legoktm@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . |
[production] |
16:07 |
<jayme@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
16:07 |
<jayme@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |
16:04 |
<legoktm@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . |
[production] |
16:03 |
<jayme@deploy1002> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
16:03 |
<jayme@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
16:03 |
<jayme@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:02 |
<legoktm@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' . |
[production] |
16:02 |
<jayme@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
16:00 |
<legoktm@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' . |
[production] |
16:00 |
<jayme@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:00 |
<jayme@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
15:57 |
<legoktm@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' . |
[production] |
15:34 |
<filippo@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2065.codfw.wmnet |
[production] |
15:33 |
<filippo@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2064.codfw.wmnet |
[production] |
15:33 |
<filippo@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2062.codfw.wmnet |
[production] |
14:17 |
<sukhe> |
ran homer for Gerrit 710358: Set up BGP peering to doh5002 in eqsin |
[production] |
14:10 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet |
[production] |