2021-09-13
§
|
11:24 |
<kharlan@deploy1002> |
Synchronized wmf-config: Config: [[gerrit:713553|WikimediaEvents: Remove UnderstandingFirstDay config]] (duration: 00m 59s) |
[production] |
10:51 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet |
[production] |
10:43 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet |
[production] |
10:15 |
<volans@cumin1001> |
END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet |
[production] |
09:33 |
<hashar> |
Castor cache: nuked files that were last changed more than six months ago to free up disk space |
[releng] |
09:33 |
<volans> |
restarting tcpircbot-logmsgbot on alert1001, not relying messages |
[production] |
09:18 |
<elukey> |
upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2 |
[production] |
09:16 |
<godog> |
swift eqiad-prod: add weight to ms-be10[64-67] - T290546 |
[production] |
09:11 |
<moritzm> |
reimaging sretest1002 |
[production] |
09:11 |
<elukey> |
upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - T277739 |
[production] |
08:57 |
<arturo> |
cleared grid queues error states (T290844) |
[tools] |
08:55 |
<arturo> |
repooling sgeexec-0907 (T290798) |
[tools] |
08:16 |
<godog> |
bump +100G prometheus/ops codfw |
[production] |
08:14 |
<arturo> |
rebooting sgeexec-0907 (T290798) |
[tools] |
08:12 |
<arturo> |
depool sgeexec-0907 (T290798) |
[tools] |
2021-09-12
§
|
21:04 |
<wm-bot> |
<lucaswerkmeister> deployed 4da7f64c4b (updates without downtime) |
[tools.lexeme-forms] |
20:02 |
<wm-bot> |
<lucaswerkmeister> deployed f21554ab71 (refactoring, noop) |
[tools.lexeme-forms] |
18:33 |
<vgutierrez> |
restart varnish-fe on cp3061, cp3063 and cp3065 |
[production] |
18:29 |
<vgutierrez> |
restart varnish on cp3055 |
[production] |
18:26 |
<vgutierrez> |
restart varnish on cp3057 |
[production] |
17:28 |
<bstorm> |
truncated 38 GB files qaus.err and purawiki.err T278199 |
[tools.khanamalumat] |
17:28 |
<bstorm> |
truncated 38 GB files qaus.err and purawiki.err |
[tools.khanamalumat] |
17:07 |
<bstorm> |
truncated 77G Worker2.out T288276 |
[tools.iabot] |
17:06 |
<bstorm> |
truncating 45G reflinks.err file T288276 |
[tools.rubin16] |
17:04 |
<bstorm> |
truncated 58gb error.log file T288276 |
[tools.magnus-toolserver] |
15:05 |
<wm-bot> |
<lucaswerkmeister> deployed a4b05045d6 (Croatian nouns) |
[tools.lexeme-forms] |
04:53 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
04:52 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
2021-09-10
§
|
23:26 |
<bstorm> |
cleared error state for tools-sgeexec-0907.tools.eqiad.wmflabs |
[tools] |
21:52 |
<James_F> |
Created experimental integration-agent-docker-1021 for T252071 |
[releng] |
21:48 |
<James_F> |
Deleting CI agent integration-agent-docker-1001 for T252071 |
[releng] |
21:44 |
<James_F> |
Pulling oldest CI agent integration-agent-docker-1001 from rotation so it can be replaced by a bullseye one for T252071 |
[releng] |
21:28 |
<legoktm@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . |
[production] |
21:27 |
<legoktm@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . |
[production] |
21:23 |
<James_F> |
Zuul: [integration/config] Add shellcheck job for scripts defined in jjb as an experimental job |
[releng] |
21:21 |
<legoktm@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . |
[production] |
20:46 |
<jhuneidi@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
20:44 |
<jhuneidi@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
20:42 |
<jhuneidi@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
18:34 |
<volans@cumin1001> |
END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet |
[production] |
18:08 |
<volans@cumin1001> |
START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet |
[production] |
17:41 |
<James_F> |
Zuul: [cloud/toolforge/jobs-framework-emailer] Add basic tox CI |
[releng] |
17:16 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE |
[production] |
17:14 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE |
[production] |