production SAL

2051-2100 of 10000 results (84ms)

2023-03-21 §
14:14	<jnuche@deploy2002>	Installing scap version "latest" for 587 hosts	[production]
14:11	<bking@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply	[production]
14:11	<bking@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply	[production]
14:10	<urbanecm@deploy2002>	Finished scap: Backport for [[gerrit:901588\|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s)	[production]
14:10	<elukey@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet	[production]
14:08	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
14:08	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
14:05	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet	[production]
14:02	<urbanecm@deploy2002>	Started scap: Backport for [[gerrit:901588\|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]]	[production]
14:00	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
13:58	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
13:42	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.	[production]
13:42	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.	[production]
13:42	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.	[production]
13:40	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.	[production]
13:38	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
13:38	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
13:33	<elukey@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet	[production]
13:29	<elukey@cumin1001>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet	[production]
13:28	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
13:25	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
13:21	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
13:16	<elukey@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet	[production]
13:11	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware	[production]
13:11	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware	[production]
13:05	<elukey>	move kafka mirror maker instances to PKI migration settings (new truststores) - T319372	[production]
11:20	<aikochou@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
11:09	<joal>	Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00	[production]
11:08	<joal>	Kill mediacounts_load oozie job	[production]
11:07	<joal>	Unpause mediawiki_history_denormalize airflow job	[production]
11:06	<joal>	Kill mediawiki_denormalize oozie job	[production]
11:04	<joal@deploy2002>	Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)	[production]
11:04	<joal@deploy2002>	Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]	[production]
10:43	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
10:32	<nfraison@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
10:24	<joal@deploy2002>	Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)	[production]
10:22	<joal@deploy2002>	Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]	[production]
10:22	<joal@deploy2002>	Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)	[production]
10:22	<joal@deploy2002>	Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]	[production]
10:22	<joal@deploy2002>	Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)	[production]
10:14	<joal@deploy2002>	Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]	[production]
09:43	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye	[production]
09:39	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage	[production]
09:39	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage	[production]
09:25	<phedenskog@deploy2002>	Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)	[production]
09:25	<phedenskog@deploy2002>	Started deploy [performance/navtiming@d2b97ad]: (no justification provided)	[production]
09:06	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC	[production]
09:05	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC	[production]
08:31	<elukey>	move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372	[production]
06:50	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150	[production]