2022-12-08
ยง
|
21:15 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
21:10 |
<jdrewniak@deploy1002> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:866501| Bumping portals to master (T128546)]] (duration: 07m 07s) |
[production] |
20:50 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye |
[production] |
20:35 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
20:34 |
<ryankemper> |
[Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005 |
[production] |
20:31 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
20:27 |
<bking@cumin2002> |
START - Cookbook sre.wdqs.data-reload |
[production] |
20:27 |
<bking@cumin2002> |
END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) |
[production] |
20:22 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
20:21 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776 |
[production] |
20:21 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776 |
[production] |
20:17 |
<ryankemper> |
T323064 Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1 |
[production] |
20:12 |
<demon@deploy1002> |
rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13 refs T320518 |
[production] |
20:09 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
19:59 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
19:59 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776 |
[production] |
19:53 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye |
[production] |
16:14 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001 |
[production] |
16:14 |
<eevans@cumin1001> |
START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001 |
[production] |
16:13 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:13 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001" |
[production] |
16:12 |
<eevans@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001" |
[production] |
16:10 |
<eevans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:08 |
<eevans@cumin1001> |
END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) |
[production] |
16:08 |
<eevans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:02 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye |
[production] |
15:48 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom |
[production] |
15:48 |
<cgoubert@cumin1001> |
START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom |
[production] |
15:45 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage |
[production] |
15:42 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage |
[production] |
15:31 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
15:31 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
15:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json |
[production] |
15:27 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet |
[production] |
15:27 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:27 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" |
[production] |
15:27 |
<jiji@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply |
[production] |
15:26 |
<jiji@deploy1002> |
helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply |
[production] |
15:25 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye |
[production] |
15:24 |
<jmm@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" |
[production] |
15:21 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
15:16 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json |
[production] |
15:15 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet |
[production] |
15:15 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:15 |
<eevans@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001" |
[production] |
15:13 |
<eevans@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001" |
[production] |
15:12 |
<hashar> |
Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754 |
[production] |
15:12 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet |
[production] |
15:10 |
<eevans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
15:09 |
<jiji@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mw-web: apply |
[production] |