2022-03-17
ยง
|
10:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22759 and previous config saved to /var/cache/conftool/dbconfig/20220317-102214-marostegui.json |
[production] |
10:10 |
<marostegui> |
dbmaint on s7@eqiad T298556 |
[production] |
10:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22758 and previous config saved to /var/cache/conftool/dbconfig/20220317-100709-marostegui.json |
[production] |
09:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22757 and previous config saved to /var/cache/conftool/dbconfig/20220317-095204-marostegui.json |
[production] |
09:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22756 and previous config saved to /var/cache/conftool/dbconfig/20220317-095044-marostegui.json |
[production] |
09:50 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
09:50 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
09:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22755 and previous config saved to /var/cache/conftool/dbconfig/20220317-094025-marostegui.json |
[production] |
09:40 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance |
[production] |
09:40 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance |
[production] |
09:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22754 and previous config saved to /var/cache/conftool/dbconfig/20220317-094017-marostegui.json |
[production] |
09:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22752 and previous config saved to /var/cache/conftool/dbconfig/20220317-092512-marostegui.json |
[production] |
09:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1096:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P22751 and previous config saved to /var/cache/conftool/dbconfig/20220317-091911-marostegui.json |
[production] |
09:19 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
09:19 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
09:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22750 and previous config saved to /var/cache/conftool/dbconfig/20220317-091007-marostegui.json |
[production] |
08:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22749 and previous config saved to /var/cache/conftool/dbconfig/20220317-085502-marostegui.json |
[production] |
08:51 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clarakosi out of all services on: 1881 hosts |
[production] |
08:51 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Clarakosi out of all services on: 1881 hosts |
[production] |
08:24 |
<urbanecm@deploy1002> |
Synchronized wmf-config/throttle.php: 0da40c22844746120de9b33e772598d38aa74326: throttle: Remove expired rules (duration: 00m 50s) |
[production] |
08:23 |
<urbanecm@deploy1002> |
Synchronized wmf-config/throttle.php: 980ea35d454563e538d08b9d6462064455b4d28c: Throttle: Increase limit for English Wikipedia (T304016) (duration: 00m 51s) |
[production] |
08:12 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ppchelko out of all services on: 1881 hosts |
[production] |
08:12 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ppchelko out of all services on: 1881 hosts |
[production] |
08:09 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Accraze out of all services on: 1881 hosts |
[production] |
08:08 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Accraze out of all services on: 1881 hosts |
[production] |
08:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22748 and previous config saved to /var/cache/conftool/dbconfig/20220317-080705-root.json |
[production] |
07:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22747 and previous config saved to /var/cache/conftool/dbconfig/20220317-075350-marostegui.json |
[production] |
07:53 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
07:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
07:53 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22746 and previous config saved to /var/cache/conftool/dbconfig/20220317-075201-root.json |
[production] |
07:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22745 and previous config saved to /var/cache/conftool/dbconfig/20220317-073658-root.json |
[production] |
07:31 |
<marostegui> |
dbmaint on s5@eqiad T297189 |
[production] |
07:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22744 and previous config saved to /var/cache/conftool/dbconfig/20220317-072154-root.json |
[production] |
07:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json |
[production] |
07:11 |
<ryankemper> |
[WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on) |
[production] |
07:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json |
[production] |
07:04 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance |
[production] |
07:04 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance |
[production] |
07:04 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance |
[production] |
07:04 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance |
[production] |
06:57 |
<ryankemper> |
[WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22 |
[production] |
06:57 |
<ryankemper> |
[WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently. |
[production] |
06:56 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json |
[production] |
06:54 |
<ryankemper> |
[WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service` |
[production] |
06:53 |
<ryankemper> |
[WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service` |
[production] |
06:50 |
<elukey> |
restart blazegraph on wdqs2004 |
[production] |
06:46 |
<elukey> |
kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken) |
[production] |
06:41 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json |
[production] |