2020-11-21
§
|
21:25 |
<wm-bot> |
<lucaswerkmeister> deployed 1608cc4dd9 (gender-dependent messages) |
[tools.lexeme-forms] |
09:18 |
<joal> |
Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb |
[production] |
09:17 |
<joal> |
Drop historical logs of ' |
[production] |
08:28 |
<ariel@deploy1001> |
Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s) |
[production] |
08:28 |
<ariel@deploy1001> |
Started deploy [dumps/dumps@1a76a9a]: revinfo updates |
[production] |
08:10 |
<elukey> |
remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110 |
[analytics] |
08:10 |
<elukey> |
remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110 |
[production] |
08:05 |
<elukey> |
remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105 |
[analytics] |
08:05 |
<elukey> |
remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105 |
[production] |
2020-11-20
§
|
23:38 |
<mutante> |
synced puppet-compiler facts - new hosts should be usable in compiler |
[production] |
23:15 |
<mutante> |
syncing facts from production masters |
[puppet-diffs] |
22:30 |
<mutante> |
cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369 |
[production] |
21:30 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
21:09 |
<razzi> |
truncate /var/lib/hadoop/data/u/yarn/logs/application_1605880843685_0581/container_e27_1605880843685_0581_01_000171/stderr logfile on an-worker1098 |
[analytics] |
20:40 |
<mutante> |
added new member razzi |
[puppet-diffs] |
20:26 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
20:09 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
19:52 |
<mutante> |
releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts |
[production] |
19:45 |
<mutante> |
releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) |
[production] |
19:39 |
<mutante> |
Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts |
[production] |
19:17 |
<Jayprakash12345> |
Deploying app (T267488) |
[tools.book2scrollv2] |
19:14 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
18:47 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
18:42 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
18:37 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
18:36 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
18:31 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
18:31 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
18:18 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
18:14 |
<dwisehaupt> |
shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259 |
[production] |
17:37 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
17:35 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:32 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
17:24 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
16:48 |
<volans@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
16:40 |
<volans@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
16:29 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
16:29 |
<razzi@cumin1001> |
END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) |
[production] |
16:28 |
<razzi> |
removed canceled ip address records for kafka-test1002 from netbox |
[production] |
16:11 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
16:09 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:06 |
<James_F> |
Zuul: [labs/tools/book2scroll] Provide CI with tox-docker T267488 |
[releng] |
16:01 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
16:01 |
<razzi@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
15:42 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
15:09 |
<andrew@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:01 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
14:59 |
<andrew@cumin1001> |
END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) |
[production] |