2021-07-28
§
|
15:05 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
14:58 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE |
[production] |
14:56 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE |
[production] |
14:39 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
14:33 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue |
[production] |
14:33 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue |
[production] |
14:19 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
14:06 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE |
[production] |
14:06 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
14:06 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
14:06 |
<dcausse@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . |
[production] |
14:04 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE |
[production] |
14:03 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE |
[production] |
14:01 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE |
[production] |
13:32 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet |
[production] |
13:29 |
<moritzm> |
installing python2.7 security updates on stretch |
[production] |
13:08 |
<moritzm> |
installing python3.5 security updates on stretch |
[production] |
12:27 |
<dcausse@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . |
[production] |
11:26 |
<moritzm> |
installing nginx security updates on thumbor* |
[production] |
11:18 |
<moritzm> |
installing nginx security updates on sodium (mirrors.wikimedia.org) |
[production] |
11:03 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
11:03 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
10:11 |
<moritzm> |
installing remaining nginx security updates on stretch |
[production] |
10:09 |
<godog> |
temp fix prometheus-icinga-am on alert1001 |
[production] |
09:40 |
<dcausse@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . |
[production] |
09:40 |
<urbanecm> |
Start server-side upload for 1 video file (T287482) |
[production] |
09:29 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
09:29 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
09:28 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
09:24 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
09:24 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
08:33 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE |
[production] |
08:31 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE |
[production] |
08:27 |
<Amir1> |
running several long-running queries against pc1007 |
[production] |
08:13 |
<oblivian@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:01 |
<dcausse@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . |
[production] |
07:53 |
<moritzm> |
installing aspell security updates on stretch |
[production] |
07:20 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: T287559 |
[production] |
07:20 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: T287559 |
[production] |
07:20 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: T287559 |
[production] |
07:20 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: T287559 |
[production] |
07:20 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: T287559 |
[production] |
07:20 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: T287559 |
[production] |
07:07 |
<godog> |
remove cloud*/syslog.log from centrallog2001 - T287559 |
[production] |
07:06 |
<godog> |
remove node_pinger.prom from node-pinger hosts |
[production] |
06:42 |
<godog> |
remove obsolete user.log.manual-rotation from centrallog1001 to free disk space |
[production] |
02:43 |
<TimStarling> |
on mwmaint2002 fixing T286273 broken files using eval.php |
[production] |