2021-08-10
§
|
12:23 |
<kormat> |
non-destructive (🤞) testing of db-switchover against s2/eqiad T288500 |
[production] |
12:17 |
<ppchelko@deploy1002> |
Started deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227 |
[production] |
11:27 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
11:27 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
10:56 |
<marostegui> |
Install 10.4.21 on db1169 (s1) |
[production] |
10:54 |
<jayme@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:53 |
<mutante> |
etherpad deleting 2 pads as requested in T288328 |
[production] |
10:52 |
<marostegui> |
Install 10.4.21 on db1096 (s5 and s6) |
[production] |
10:34 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
10:34 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
10:33 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
10:33 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
10:28 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:27 |
<oblivian@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:24 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
09:55 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:708309|Remove $wmgWikibaseClientRepoDatabase (T257260)]] (2/2, beta) (duration: 00m 57s) |
[production] |
09:54 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708309|Remove $wmgWikibaseClientRepoDatabase (T257260)]] (1/2, prod) (duration: 00m 57s) |
[production] |
09:50 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708308|Stop setting $wgWBClientSettings['repoDatabase'] (T257260)]] (duration: 00m 58s) |
[production] |
09:47 |
<jayme@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
09:23 |
<ariel@deploy1002> |
Finished deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file (duration: 00m 03s) |
[production] |
09:22 |
<ariel@deploy1002> |
Started deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file |
[production] |
09:17 |
<kormat> |
running non-destructive test against s7/codfw (db2107/db2014) T288500 |
[production] |
09:05 |
<jayme@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
09:04 |
<moritzm> |
removing stale Java 8 packages from logstash1024/1025/2023/2024/2025 (ELK7 Logstash cluster is on Java 11 for a while now) |
[production] |
09:00 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:58 |
<ariel@deploy1002> |
Finished deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files (duration: 00m 03s) |
[production] |
08:58 |
<ariel@deploy1002> |
Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files |
[production] |
08:49 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:20 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
08:20 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
08:19 |
<jayme@deploy1002> |
helmfile [codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:18 |
<jayme@deploy1002> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
08:16 |
<jayme@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:16 |
<jayme@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:15 |
<jayme@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:14 |
<jayme@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
08:06 |
<godog> |
upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - T288326 |
[production] |
08:03 |
<moritzm> |
installing openjdk-8 security updates on stretch |
[production] |
07:33 |
<moritzm> |
installing lynx security updates |
[production] |
05:56 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json |
[production] |
05:41 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json |
[production] |
05:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json |
[production] |
05:11 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json |
[production] |
05:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped T287454', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json |
[production] |
05:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json |
[production] |
05:00 |
<marostegui> |
Starting s2 codfw failover from db2107 to db2104 - T287454 |
[production] |
04:23 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454 |
[production] |
04:23 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454 |
[production] |