2020-03-02
§
|
09:51 |
<addshore> |
START warm cache for db1111 & db1126 for Q6-8 million T219123 (pass 1) |
[production] |
09:50 |
<elukey> |
powercycle an-worker1083 (no ssh, mgmt console available but tty not really usable, CPU soft lockups reported) |
[production] |
09:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1119 after upgrade T239791', diff saved to https://phabricator.wikimedia.org/P10573 and previous config saved to /var/cache/conftool/dbconfig/20200302-094633-marostegui.json |
[production] |
09:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1119 after upgrade T239791', diff saved to https://phabricator.wikimedia.org/P10572 and previous config saved to /var/cache/conftool/dbconfig/20200302-093848-marostegui.json |
[production] |
09:38 |
<moritzm> |
installing openssh updates for jessie |
[production] |
09:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Increase weight from 80 to 100 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10571 and previous config saved to /var/cache/conftool/dbconfig/20200302-093449-marostegui.json |
[production] |
09:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1119 after upgrade T239791', diff saved to https://phabricator.wikimedia.org/P10570 and previous config saved to /var/cache/conftool/dbconfig/20200302-092743-marostegui.json |
[production] |
09:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1119 T239791', diff saved to https://phabricator.wikimedia.org/P10569 and previous config saved to /var/cache/conftool/dbconfig/20200302-091947-marostegui.json |
[production] |
09:12 |
<addshore> |
warm cache for db1111 for Q0-6 million T219123 T246447 (pass 2) |
[production] |
08:54 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Increase weight from 50 to 80 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10568 and previous config saved to /var/cache/conftool/dbconfig/20200302-085420-marostegui.json |
[production] |
08:44 |
<moritzm> |
installing openssh updates for stretch |
[production] |
08:33 |
<addshore> |
warm cache for db1111 for Q0-6 million T219123 T246447 |
[production] |
08:14 |
<addshore> |
resume item term table rebuild script (from Q54 mill) T219123 |
[production] |
08:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Increase weight from 30 to 50 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10567 and previous config saved to /var/cache/conftool/dbconfig/20200302-080721-marostegui.json |
[production] |
07:22 |
<vgutierrez> |
upgrading NICs FW on lvs2008 - T196560 T203194 |
[production] |
07:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Increase weight from 10 to 30 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10566 and previous config saved to /var/cache/conftool/dbconfig/20200302-072118-marostegui.json |
[production] |
07:10 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
07:08 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
06:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Increase weight from 1 to 10 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10565 and previous config saved to /var/cache/conftool/dbconfig/20200302-064522-marostegui.json |
[production] |
06:42 |
<marostegui> |
Enable events on db1111 T246447 |
[production] |
06:24 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1111 to s8 with minimal weight to check grants and any other issues T246447', diff saved to https://phabricator.wikimedia.org/P10564 and previous config saved to /var/cache/conftool/dbconfig/20200302-062435-marostegui.json |
[production] |
06:04 |
<marostegui> |
Re-add db1111 to s8 in tendril and zarcillo - T246447 |
[production] |
2020-03-01
§
|
20:58 |
<bstorm_> |
set namespace resourcequota for cpu to 2.5 T246553 |
[tools.teg] |
20:49 |
<bstorm_> |
starting php7.2 webservice T246553 |
[tools.teg] |
20:48 |
<bstorm_> |
running kubectl apply -f backend.yml T246553 |
[tools.teg] |
20:45 |
<bstorm_> |
increased services quota to 2 for k8s T246553 |
[tools.teg] |
17:54 |
<marostegui> |
Start replication on db1111 new host on s8 - T246447 |
[production] |
17:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Reduce main traffic weight for db1087 as dumps are running ', diff saved to https://phabricator.wikimedia.org/P10563 and previous config saved to /var/cache/conftool/dbconfig/20200301-174536-marostegui.json |
[production] |
16:08 |
<reedy@deploy1001> |
scap failed: average error rate on 5/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) |
[production] |
15:55 |
<wikibugs> |
Updated channels.yaml to: 7fc88d6228d00ad5e773c8c4b0065e2471240a03 Add User-revi to the channels list |
[tools.wikibugs] |
08:10 |
<wm-bot> |
<root> Edited /data/project/wikicite-dashboard/.lighttpd.conf to remove fast-cgi config section that was preventing lighttpd from starting. (T246559) |
[tools.wikicite-dashboard] |
08:03 |
<wm-bot> |
<root> Deleted webarchivebot-backend deployment because code is not running properly on the 2020 Kubernetes cluster. (T246559, T246563) |
[tools.webarchivebot] |
07:51 |
<wm-bot> |
<root> Stopped tool because it is missing the required www/python/src/app.py entry point. This is causing uwsgi to crash. (T246559, T246562) |
[tools.wdumps] |
07:39 |
<wm-bot> |
<root> Restarted with python3.7 runtime to match existing www/python/venv version (T246559) |
[tools.shextranslator] |
07:35 |
<wm-bot> |
<root> Edited /data/project/quick-intersection/.lighttpd.conf to remove fast-cgi config section that was preventing lighttpd from starting. (T246559) |
[tools.quick-intersection] |
07:32 |
<wm-bot> |
<root> Hard restart of webservice to fix startup crash caused by a webservice v0.63 bug (T246559) |
[tools.mw2sparql] |
06:28 |
<wm-bot> |
<root> Stopped python webservice because there are no python application files in www/python/src. (T246559) |
[tools.matthobot] |
06:26 |
<wm-bot> |
<root> Restarted with python3.5 runtime to match existing www/python/venv version (T246559) |
[tools.machtsinn] |
06:23 |
<wm-bot> |
<root> Stopped trivial 'hello world' tool after fixing it's venv. |
[tools.hroest2] |
06:23 |
<wm-bot> |
<root> Tool stuck in CrashLoopBackOff because python 2.7 venv did not contain the flask library. (T246559) |
[tools.hroest2] |
06:18 |
<wm-bot> |
<root> Restarted webservice as 'python' type (py3.4) rather than 'python3.5' to match existing www/python/venv (T246559) |
[tools.farhangestan] |
06:15 |
<wm-bot> |
<root> Hard restart of webservice to fix startup crash caused by a webservice v0.63 bug (T246559) |
[tools.dtz] |
06:02 |
<ariel@deploy1001> |
Finished deploy [dumps/dumps@8376c62]: refactor page content jobs, prefetch, and output file listings: see T246465 (duration: 00m 04s) |
[production] |
06:02 |
<ariel@deploy1001> |
Started deploy [dumps/dumps@8376c62]: refactor page content jobs, prefetch, and output file listings: see T246465 |
[production] |
05:46 |
<wm-bot> |
<root> Hard restart of webservice to fix startup crash caused by a webservice v0.63 bug (T246559) |
[tools.corhist] |
05:43 |
<wm-bot> |
<root> Restarted as a python2 webservice to match the existing www/python/venv. (T246559) |
[tools.cobot] |
05:37 |
<wm-bot> |
<root> Editied /data/project/catscan2/.lighttpd.conf to remove fast-cgi config section that was preventing lighttpd from starting. (T246559) |
[tools.catscan2] |
05:14 |
<wm-bot> |
<root> Hard restart of webservice to fix startup crash caused by a webservice v0.63 (T246559) |
[tools.blubber] |
01:48 |
<bstorm_> |
old version of kubectl removed. Anyone who needs it can download it with `curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.4.12/bin/linux/amd64/kubectl` |
[tools] |
01:42 |
<bstorm_> |
set the current context to 'default' to make migration easier for now using ~/bin/kubectl |
[tools.paws-public] |