production SAL

3401-3450 of 10000 results (71ms)

2022-12-08 §
20:50	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye	[production]
20:35	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
20:34	<ryankemper>	[Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005	[production]
20:31	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
20:27	<bking@cumin2002>	START - Cookbook sre.wdqs.data-reload	[production]
20:27	<bking@cumin2002>	END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)	[production]
20:22	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
20:21	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776	[production]
20:21	<ryankemper@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776	[production]
20:17	<ryankemper>	T323064 Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1	[production]
20:12	<demon@deploy1002>	rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13 refs T320518	[production]
20:09	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
19:59	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
19:59	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776	[production]
19:53	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye	[production]
16:14	<eevans@cumin1001>	END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001	[production]
16:14	<eevans@cumin1001>	START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001	[production]
16:13	<eevans@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:13	<eevans@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"	[production]
16:12	<eevans@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"	[production]
16:10	<eevans@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:08	<eevans@cumin1001>	END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)	[production]
16:08	<eevans@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:02	<mvernon@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye	[production]
15:48	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom	[production]
15:48	<cgoubert@cumin1001>	START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom	[production]
15:45	<mvernon@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage	[production]
15:42	<mvernon@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage	[production]
15:31	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance	[production]
15:31	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance	[production]
15:31	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json	[production]
15:27	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet	[production]
15:27	<jmm@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:27	<jmm@cumin2002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"	[production]
15:27	<jiji@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply	[production]
15:26	<jiji@deploy1002>	helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply	[production]
15:25	<mvernon@cumin2002>	START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye	[production]
15:24	<jmm@cumin2002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"	[production]
15:21	<jmm@cumin2002>	START - Cookbook sre.dns.netbox	[production]
15:16	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json	[production]
15:15	<eevans@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet	[production]
15:15	<eevans@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:15	<eevans@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"	[production]
15:13	<eevans@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"	[production]
15:12	<hashar>	Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754	[production]
15:12	<jmm@cumin2002>	START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet	[production]
15:10	<eevans@cumin1001>	START - Cookbook sre.dns.netbox	[production]
15:09	<jiji@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mw-web: apply	[production]
15:08	<jiji@deploy1002>	helmfile [eqiad] START helmfile.d/services/mw-web: apply	[production]
15:08	<jiji@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mw-web: apply	[production]