2021-08-12
ยง
|
08:37 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet |
[production] |
08:29 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet |
[production] |
08:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 40%: After reimage', diff saved to https://phabricator.wikimedia.org/P17011 and previous config saved to /var/cache/conftool/dbconfig/20210812-082855-root.json |
[production] |
08:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet |
[production] |
08:18 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet |
[production] |
08:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 30%: After reimage', diff saved to https://phabricator.wikimedia.org/P17010 and previous config saved to /var/cache/conftool/dbconfig/20210812-081351-root.json |
[production] |
07:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 20%: After reimage', diff saved to https://phabricator.wikimedia.org/P17009 and previous config saved to /var/cache/conftool/dbconfig/20210812-075848-root.json |
[production] |
07:58 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |
07:53 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet |
[production] |
07:52 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet |
[production] |
07:46 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet |
[production] |
07:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 15%: After reimage', diff saved to https://phabricator.wikimedia.org/P17008 and previous config saved to /var/cache/conftool/dbconfig/20210812-074344-root.json |
[production] |
07:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org |
[production] |
07:38 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org |
[production] |
07:36 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org |
[production] |
07:34 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org |
[production] |
07:32 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org |
[production] |
07:30 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org |
[production] |
07:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P17007 and previous config saved to /var/cache/conftool/dbconfig/20210812-072841-root.json |
[production] |
07:26 |
<godog> |
temp upgrade thanos to 0.22.0 on thanos-fe2001 to help debug a potential upstream issue |
[production] |
07:25 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org |
[production] |
07:23 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org |
[production] |
07:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet |
[production] |
07:17 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet |
[production] |
07:16 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet |
[production] |
07:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P17006 and previous config saved to /var/cache/conftool/dbconfig/20210812-071337-root.json |
[production] |
07:13 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet |
[production] |
06:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2107 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P17005 and previous config saved to /var/cache/conftool/dbconfig/20210812-065833-root.json |
[production] |
06:49 |
<tstarling@deploy1002> |
Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: fix for T288711 failure of election creation (duration: 01m 09s) |
[production] |
06:47 |
<moritzm> |
updating bullseye installations to the latest state of testing |
[production] |
06:46 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
06:36 |
<moritzm> |
installing c-ares security updates on Bullseye |
[production] |
06:32 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
06:31 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
06:00 |
<marostegui> |
Failover m3 from db1132 to db1107 - T288197 |
[production] |
05:15 |
<ryankemper> |
[WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal after nuking wdqs2004's" --blazegraph_instance blazegraph` |
[production] |
05:15 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
05:14 |
<ryankemper> |
[WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good |
[production] |
04:45 |
<eileen> |
tools revision changed from c26a8c0cb6 to 15bfaa7117 |
[production] |
04:44 |
<ryankemper> |
[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
04:44 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
04:44 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
04:43 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 02m 07s) |
[production] |
04:41 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81 |
[production] |
04:41 |
<ryankemper> |
[WDQS Deploy] Re-rolling deploy so that `wdqs2004` gets deployed to |
[production] |
04:41 |
<ryankemper> |
[WDQS] `wdqs2004`'s disk is full due to overinflated `wikidata.jnl`, nuking and depooling: `sudo rm -fv /srv/wdqs/wikidata.jnl && sudo depool` |
[production] |
04:40 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 17m 03s) |
[production] |
04:26 |
<ryankemper> |
[WDQS Deploy] Tests passing following deploy of `0.3.81` on canary `wdqs1003`; proceeding to rest of fleet |
[production] |
04:23 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81 |
[production] |
04:21 |
<ryankemper> |
[WDQS Deploy] Gearing up for deploy of wdqs `0.3.81`. Pre-deploy tests passing on canary `wdqs1003` |
[production] |