2023-03-21
ยง
|
14:10 |
<urbanecm@deploy2002> |
Finished scap: Backport for [[gerrit:901588|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s) |
[production] |
14:10 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
14:08 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
14:08 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
14:05 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
14:02 |
<urbanecm@deploy2002> |
Started scap: Backport for [[gerrit:901588|Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] |
[production] |
14:00 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
13:58 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
13:42 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
13:42 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. |
[production] |
13:42 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
13:40 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. |
[production] |
13:38 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
13:38 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
13:33 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:28 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
13:25 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
13:21 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
13:16 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:11 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
13:11 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
13:05 |
<elukey> |
move kafka mirror maker instances to PKI migration settings (new truststores) - T319372 |
[production] |
11:20 |
<aikochou@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . |
[production] |
11:09 |
<joal> |
Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00 |
[production] |
11:08 |
<joal> |
Kill mediacounts_load oozie job |
[production] |
11:07 |
<joal> |
Unpause mediawiki_history_denormalize airflow job |
[production] |
11:06 |
<joal> |
Kill mediawiki_denormalize oozie job |
[production] |
11:04 |
<joal@deploy2002> |
Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s) |
[production] |
11:04 |
<joal@deploy2002> |
Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] |
[production] |
10:43 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
10:32 |
<nfraison@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
10:24 |
<joal@deploy2002> |
Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s) |
[production] |
10:22 |
<joal@deploy2002> |
Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] |
[production] |
10:22 |
<joal@deploy2002> |
Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s) |
[production] |
10:22 |
<joal@deploy2002> |
Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] |
[production] |
10:22 |
<joal@deploy2002> |
Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s) |
[production] |
10:14 |
<joal@deploy2002> |
Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] |
[production] |
09:43 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
09:39 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
09:39 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
09:25 |
<phedenskog@deploy2002> |
Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s) |
[production] |
09:25 |
<phedenskog@deploy2002> |
Started deploy [performance/navtiming@d2b97ad]: (no justification provided) |
[production] |
09:06 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
09:05 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
08:31 |
<elukey> |
move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372 |
[production] |
06:50 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150 |
[production] |
06:49 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.peering with action 'configure' for AS: 13150 |
[production] |
03:57 |
<mwpresync@deploy2002> |
Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s) |
[production] |
03:55 |
<mwpresync@deploy2002> |
Finished scap: testwikis wikis to 1.41.0-wmf.1 refs T330207 (duration: 52m 38s) |
[production] |