2023-07-06
ยง
|
09:58 |
<stevemunene@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts analytics1061.eqiad.wmnet |
[production] |
09:57 |
<stevemunene> |
decommission analytics1061.eqiad.wmnet T339199 |
[analytics] |
09:35 |
<mvernon@cumin1001> |
START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe |
[production] |
09:28 |
<btullis@deploy1002> |
helmfile [staging] DONE helmfile.d/services/datahub: sync on main |
[production] |
09:13 |
<btullis@deploy1002> |
helmfile [staging] START helmfile.d/services/datahub: apply on main |
[production] |
09:11 |
<elukey> |
restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues |
[production] |
09:10 |
<hashar@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.16 refs T340244 |
[production] |
08:55 |
<oblivian@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/cxserver: apply |
[production] |
08:55 |
<oblivian@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cxserver: apply |
[production] |
08:51 |
<oblivian@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/cxserver: apply |
[production] |
08:50 |
<oblivian@deploy1002> |
helmfile [codfw] START helmfile.d/services/cxserver: apply |
[production] |
08:49 |
<oblivian@deploy1002> |
helmfile [staging] DONE helmfile.d/services/cxserver: apply |
[production] |
08:49 |
<oblivian@deploy1002> |
helmfile [staging] START helmfile.d/services/cxserver: apply |
[production] |
08:45 |
<fabfur> |
reenabled puppet on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet |
[production] |
08:39 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe |
[production] |
08:17 |
<fabfur> |
disabling puppet temporary on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet to apply 935760 (T340983) |
[production] |
08:03 |
<jelto@cumin1001> |
END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade |
[production] |
07:31 |
<kart_> |
Updated MinT to 2023-07-06-051402-production |
[production] |
07:29 |
<mvernon@cumin1001> |
START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe |
[production] |
07:29 |
<kartik@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply |
[production] |
07:25 |
<kartik@deploy1002> |
helmfile [eqiad] START helmfile.d/services/machinetranslation: apply |
[production] |
07:23 |
<stevemunene> |
run puppet agent on hadoop masters |
[analytics] |
07:23 |
<kartik@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply |
[production] |
07:21 |
<stevemunene> |
Remove analytics1064_1069 from hdfs net_topology |
[analytics] |
07:17 |
<stevemunene> |
stop hadoop-hdfs-datanode service on analytics[1061-1069] Preparing to decommission the hosts - T317861 |
[analytics] |
07:17 |
<kartik@deploy1002> |
helmfile [codfw] START helmfile.d/services/machinetranslation: apply |
[production] |
07:12 |
<kartik@deploy1002> |
helmfile [staging] DONE helmfile.d/services/machinetranslation: apply |
[production] |
07:11 |
<stevemunene> |
disable-puppet on analytics[1061-1069] Preparing to decommission the hosts - T317861 |
[analytics] |
07:09 |
<kartik@deploy1002> |
helmfile [staging] START helmfile.d/services/machinetranslation: apply |
[production] |
07:04 |
<stevemunene@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts |
[production] |
07:04 |
<stevemunene@cumin1001> |
START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts |
[production] |
06:54 |
<jelto@cumin1001> |
START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade |
[production] |
04:37 |
<AntiComposite> |
Upgrade CVNBot21 to v4.0.3 |
[cvn] |
04:34 |
<AntiComposite> |
Upgrade CVNBot20 to v4.0.3 |
[cvn] |
04:33 |
<AntiComposite> |
Upgrade CVNBot18 to v4.0.3 |
[cvn] |
04:30 |
<AntiComposite> |
Upgrade CVNBot15 to v4.0.3 |
[cvn] |
04:23 |
<AntiComposite> |
Upgrade CVNBot14 to v4.0.3 |
[cvn] |
04:22 |
<AntiComposite> |
Upgrade CVNBot13 to v4.0.3 |
[cvn] |
04:14 |
<AntiComposite> |
Upgrade CVNBot12 to v4.0.3 |
[cvn] |
04:09 |
<AntiComposite> |
Upgrade CVNBot11 to v4.0.3 |
[cvn] |
04:03 |
<AntiComposite> |
Upgrade CVNBot5 to v4.0.3 |
[cvn] |
04:01 |
<AntiComposite> |
Upgrade CVNBot4 to v4.0.3 |
[cvn] |
04:00 |
<AntiComposite> |
Upgrade CVNBot3 to v4.0.3 |
[cvn] |
03:57 |
<AntiComposite> |
Upgrade CVNBot2 to v4.0.3 |
[cvn] |
03:51 |
<AntiComposite> |
Upgrade CVNBot1 to v4.0.3 |
[cvn] |
02:17 |
<rzl@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply |
[production] |
02:16 |
<rzl@deploy1002> |
helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply |
[production] |
02:06 |
<rzl@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply |
[production] |
02:05 |
<rzl@deploy1002> |
helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply |
[production] |
02:05 |
<rzl@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply |
[production] |