2051-2100 of 10000 results (84ms)
2023-03-21 ยง
14:14 <jnuche@deploy2002> Installing scap version "latest" for 587 hosts [production]
14:11 <bking@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
14:11 <bking@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
14:10 <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:901588|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s) [production]
14:10 <elukey@cumin1001> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet [production]
14:08 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
14:08 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
14:05 <elukey@cumin1001> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet [production]
14:02 <urbanecm@deploy2002> Started scap: Backport for [[gerrit:901588|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] [production]
14:00 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
13:58 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
13:42 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
13:42 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. [production]
13:42 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
13:40 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. [production]
13:38 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
13:38 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
13:33 <elukey@cumin1001> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet [production]
13:29 <elukey@cumin1001> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet [production]
13:28 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
13:25 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
13:21 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
13:16 <elukey@cumin1001> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet [production]
13:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware [production]
13:11 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware [production]
13:05 <elukey> move kafka mirror maker instances to PKI migration settings (new truststores) - T319372 [production]
11:20 <aikochou@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
11:09 <joal> Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00 [production]
11:08 <joal> Kill mediacounts_load oozie job [production]
11:07 <joal> Unpause mediawiki_history_denormalize airflow job [production]
11:06 <joal> Kill mediawiki_denormalize oozie job [production]
11:04 <joal@deploy2002> Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s) [production]
11:04 <joal@deploy2002> Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] [production]
10:43 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
10:32 <nfraison@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
10:24 <joal@deploy2002> Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s) [production]
10:22 <joal@deploy2002> Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] [production]
10:22 <joal@deploy2002> Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s) [production]
10:22 <joal@deploy2002> Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] [production]
10:22 <joal@deploy2002> Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s) [production]
10:14 <joal@deploy2002> Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] [production]
09:43 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye [production]
09:39 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage [production]
09:39 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage [production]
09:25 <phedenskog@deploy2002> Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s) [production]
09:25 <phedenskog@deploy2002> Started deploy [performance/navtiming@d2b97ad]: (no justification provided) [production]
09:06 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC [production]
09:05 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC [production]
08:31 <elukey> move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372 [production]
06:50 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150 [production]