2021-05-06
ยง
|
17:47 |
<volans@cumin2001> |
END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet |
[production] |
17:47 |
<volans@cumin2001> |
START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet |
[production] |
17:35 |
<jgiannelos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:33 |
<jgiannelos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:27 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet |
[production] |
17:20 |
<jgiannelos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:15 |
<volans> |
upgrade spicerack on cumin* to 0.0.52 |
[production] |
17:15 |
<ryankemper> |
[Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`) |
[production] |
17:13 |
<papaul> |
powerdown ms-be2057 for relocation |
[production] |
17:13 |
<jgiannelos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:12 |
<volans> |
uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia |
[production] |
17:00 |
<papaul> |
powerdown elastic2058 for relocation |
[production] |
16:43 |
<vgutierrez> |
Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673 |
[production] |
16:12 |
<papaul> |
powerdown mc-gp2002 for relocation |
[production] |
16:09 |
<ryankemper> |
[Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`) |
[production] |
15:58 |
<Amir1> |
starting upgrade of public mailing lists in group d and e (T280322) |
[production] |
15:50 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE |
[production] |
15:47 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE |
[production] |
15:42 |
<papaul> |
powerdown logstash2027 for relocation |
[production] |
15:41 |
<mvolz@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' . |
[production] |
15:40 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563 |
[production] |
15:34 |
<XioNoX> |
push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw |
[production] |
15:33 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563 |
[production] |
15:32 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
15:32 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
15:31 |
<mvolz@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' . |
[production] |
15:29 |
<cdanis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz |
[production] |
15:29 |
<cdanis@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz |
[production] |
15:26 |
<ryankemper> |
T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004` |
[production] |
15:26 |
<ryankemper> |
T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv` |
[production] |
15:26 |
<ryankemper> |
T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv` |
[production] |
15:20 |
<mvolz@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' . |
[production] |
15:16 |
<mvolz@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . |
[production] |
15:14 |
<papaul> |
powerdown ms-be2053 for relocation |
[production] |
15:10 |
<moritzm> |
imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org |
[production] |
15:07 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704 |
[production] |
15:06 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704 |
[production] |
15:06 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704 |
[production] |
15:06 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704 |
[production] |
15:06 |
<mvolz@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . |
[production] |
15:05 |
<moritzm> |
imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org |
[production] |
14:55 |
<papaul> |
powerdown kafka-main2002 for relocation |
[production] |
14:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json |
[production] |
14:09 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json |
[production] |
13:37 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json |
[production] |
13:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json |
[production] |
13:21 |
<XioNoX> |
push pfw policies - T281942 |
[production] |
13:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json |
[production] |
12:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json |
[production] |
11:44 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet |
[production] |