|
2021-08-10
ยง
|
| 21:37 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:36 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:35 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:35 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:33 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:32 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:30 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE |
[production] |
| 21:08 |
<jhuneidi@deploy1002> |
rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.37.0-wmf.18" |
[production] |
| 21:02 |
<krinkle@deploy1002> |
Synchronized wmf-config/: I3b54d163b6 (duration: 01m 09s) |
[production] |
| 20:54 |
<krinkle@deploy1002> |
Synchronized wmf-config/CommonSettings.php: If7a8d6b6 (duration: 01m 22s) |
[production] |
| 20:43 |
<andrew@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:42 |
<krinkle@deploy1002> |
Synchronized wmf-config/: Ic5ff34b (duration: 01m 08s) |
[production] |
| 20:40 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:37 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:35 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:34 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:33 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:32 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:31 |
<krinkle@deploy1002> |
Synchronized docroot/noc/: Ic013a93998f (duration: 01m 37s) |
[production] |
| 20:31 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:30 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:29 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:28 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE |
[production] |
| 20:26 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:29 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:27 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:27 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:25 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:16 |
<cmjohnson@cumin1001> |
END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) |
[production] |
| 19:15 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
| 19:09 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:09 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:07 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:05 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:04 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:04 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:04 |
<jhuneidi@deploy1002> |
rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159 |
[production] |
| 19:04 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE |
[production] |
| 19:03 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE |
[production] |
| 18:49 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
| 18:47 |
<ryankemper> |
[WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag) |
[production] |
| 18:46 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
| 18:46 |
<ryankemper> |
T288501 (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on) |
[production] |
| 18:45 |
<ryankemper> |
T288501 `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled |
[production] |
| 18:42 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
| 17:23 |
<jhuneidi@deploy1002> |
Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s) |
[production] |
| 17:19 |
<ryankemper> |
T288501 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001` tmux session `wdqs_data_xfer` |
[production] |
| 17:19 |
<ryankemper@cumin2001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
| 17:18 |
<mbsantos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
| 17:13 |
<ryankemper> |
T288501 [WDQS] `ryankemper@wdqs2003:~$ sudo rm -fv /srv/wdqs/wikidata.jnl` |
[production] |