2020-04-12
§
|
11:11 |
<vgutierrez> |
restart ats-tls on cp5008.eqsin.wmnet - T249335 |
[production] |
10:18 |
<elukey> |
restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure) |
[production] |
06:59 |
<dcausse> |
restarting blazegraph on wdqs1004 (T242453) |
[production] |
06:35 |
<elukey@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet |
[production] |
06:32 |
<elukey> |
powerdown restbase1025 - T250027 |
[production] |
06:20 |
<elukey> |
powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2) |
[production] |
05:53 |
<bblack> |
pushing https://gerrit.wikimedia.org/r/588134 to cache_text |
[production] |
05:50 |
<vgutierrez> |
restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
05:31 |
<bblack> |
pushing https://gerrit.wikimedia.org/r/588133 to cache_text |
[production] |
04:11 |
<bd808> |
Hopefully fixed T243843, T243843, T243843, and T243843 (deliberate duplication there folks) |
[tools.stashbot] |
04:09 |
<bd808> |
--canonical for webservice |
[tools.stashbot] |
03:59 |
<bd808> |
test |
[tools.stashbot] |
02:58 |
<bd808> |
Set --canonical to force redirect to sal.toolforge.org and added service.template to make this all easier in the future |
[tools.sal] |
02:52 |
<wm-bot> |
<bd808> Updated to 49015bb: Manually setup Elasticsearch creds (T247715) |
[tools.sal] |
00:11 |
<bd808> |
Everything broken at the moment because of elasticsearch7 migration not going as hoped. |
[tools.sal] |
2020-04-11
§
|
23:17 |
<wm-bot> |
<bd808> Updated config to point to es7 cluster (T247715) |
[tools.sal] |
23:11 |
<wm-bot> |
<bd808> Updated to f2ca4e4 925b463 Update !log handling for es7 (T247715) |
[tools.stashbot] |
22:59 |
<wm-bot> |
<bd808> Updated to f2ca4e4 Update !bash handling for es7 (T247715) |
[tools.stashbot] |
19:52 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json |
[production] |
17:35 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json |
[production] |
15:39 |
<vgutierrez> |
restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
15:07 |
<Krenair> |
Migrated from deployment-cache-text05 (stretch) to deployment-cache-text06 (buster) - class stopped working on stretch with https://gerrit.wikimedia.org/r/c/operations/puppet/+/584553 - shut down old instance - T250006 |
[releng] |
14:52 |
<Krenair> |
Migrated from deployment-cache-upload05 (stretch) to deployment-cache-upload06 (buster) - class stopped working on stretch with https://gerrit.wikimedia.org/r/c/operations/puppet/+/584553 - shut down old instance which coincidentally would turn one year old tomorrow |
[releng] |
09:30 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) |
[production] |
09:20 |
<elukey@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers |
[production] |
09:19 |
<elukey> |
set hive-security: read-only for the Presto hive connector and roll restart the cluster |
[analytics] |
07:01 |
<vgutierrez> |
restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
2020-04-10
§
|
23:01 |
<hashar> |
Reloading Zuul for https://gerrit.wikimedia.org/r/#/c/integration/config/+/588054/ |
[releng] |
21:33 |
<bd808> |
Rebuilding all Docker images for the Kubernetes cluster (T249843) |
[tools] |
21:27 |
<bd808> |
Rebuilding all Docker images for the Kubernetes cluster (T249843) |
[tools] |
21:12 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json |
[production] |
21:02 |
<andrewbogott> |
deleted a bunch of canary VMS and other miscellaneous testing VMs which (I hope) were mine |
[testlabs] |
19:37 |
<cdanis> |
cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220 |
[production] |
19:36 |
<bstorm_> |
after testing deploying toollabs-webservice 0.67 to tools repos T249843 |
[tools] |
19:32 |
<bstorm_> |
deployed webservice 0.67 T249843 |
[toolsbeta] |
19:28 |
<longma> |
Updating docker-pkg files on contint2001 for https://gerrit.wikimedia.org/r/c/integration/config/+/588029 |
[releng] |
18:59 |
<bstorm_> |
delete toolsbeta-gitlab-01 and build toolsbeta-workflow-test T249946 |
[toolsbeta] |
18:40 |
<James_F> |
Docker: Publishing composer-php70:0.2.0 and cascade. |
[releng] |
17:12 |
<twentyafterfour> |
deploying docker image on contint2001 with `tox -e fabric -- deploy_docker` |
[releng] |
17:08 |
<James_F> |
Docker: Publishing release-notes:0.0.5 on contint2001 |
[releng] |
16:46 |
<hashar> |
contint1001: deleting docker-pkg maintained images. They are in the registry anyway. # T224591 |
[releng] |
16:37 |
<wm-bot> |
<peterbowman> Deploy 51e4941 (update wiki-java, MissingRefs webapp) |
[tools.pbbot] |
16:31 |
<elukey> |
enable TLS from kafkatee to Kafka on analytics1030 (test instance) |
[analytics] |
15:45 |
<elukey> |
migrate data_purge timers from an-coord1001 to an-launcher1001 |
[analytics] |
15:03 |
<vgutierrez> |
restart ats-tls on cp1083 and cp1085 - T249335 |
[production] |
14:53 |
<arturo> |
live-hacking tools-puppetmaster-02 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/587991 for T249837 |
[tools] |
13:14 |
<hashar@deploy1001> |
Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s) |
[production] |
13:14 |
<hashar@deploy1001> |
Started deploy [zuul/deploy@4a69913]: (no justification provided) |
[production] |