2022-01-13
ยง
|
10:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18719 and previous config saved to /var/cache/conftool/dbconfig/20220113-102734-root.json |
[production] |
10:27 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on lists1001 T273026 |
[production] |
10:13 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana1002.eqiad.wmnet |
[production] |
10:10 |
<moritzm> |
rebooting grafana1002 (running grafana.wikimedia.org) |
[production] |
10:10 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM grafana1002.eqiad.wmnet |
[production] |
10:09 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye |
[production] |
10:02 |
<mmandere> |
cp3052: upgrade varnish to 6.0.9-1wm1 T298758 |
[production] |
10:02 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] (duration: 21m 47s) |
[production] |
10:02 |
<elukey> |
run kafka preferred-replica-election on kafka-main1001 to force a rebalance of partition leaders (after kafka-main1002's reimage) |
[production] |
10:00 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet |
[production] |
09:59 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1002.eqiad.wmnet with OS buster |
[production] |
09:56 |
<btullis@cumin1001> |
START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet |
[production] |
09:49 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:46 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:42 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:40 |
<joal@deploy1002> |
Started deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] |
[production] |
09:40 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] (duration: 00m 07s) |
[production] |
09:40 |
<joal@deploy1002> |
Started deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] |
[production] |
09:39 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] (duration: 06m 59s) |
[production] |
09:35 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:32 |
<joal@deploy1002> |
Started deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] |
[production] |
09:30 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:30 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:26 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1002.eqiad.wmnet with OS buster |
[production] |
09:25 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:24 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:16 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui1001.eqiad.wmnet |
[production] |
09:14 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM xhgui1001.eqiad.wmnet |
[production] |
09:08 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye |
[production] |
09:03 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM lists1001.wikimedia.org |
[production] |
09:02 |
<moritzm> |
rebooting lists1001 (running lists.wikimedia.org) to pick up new KVM setting |
[production] |
09:00 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org |
[production] |
08:59 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool es1022, give weight to es1021 T295965 ', diff saved to https://phabricator.wikimedia.org/P18718 and previous config saved to /var/cache/conftool/dbconfig/20220113-085906-marostegui.json |
[production] |
08:42 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1003.eqiad.wmnet with OS buster |
[production] |
08:39 |
<elukey> |
ipmi mc reset cold for kafka-main1002, mgmt interface not reachable via ssh |
[production] |
08:39 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove recentchanges group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18717 and previous config saved to /var/cache/conftool/dbconfig/20220113-083923-marostegui.json |
[production] |
08:28 |
<ladsgroup@deploy1002> |
Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753505|Take LogicException into consideration (T299111)]] (duration: 01m 28s) |
[production] |
08:28 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
08:27 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
08:27 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
08:23 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
08:21 |
<ladsgroup@deploy1002> |
Synchronized php-1.38.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753504|Take LogicException into consideration (T299111)]] (duration: 01m 28s) |
[production] |
08:13 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
08:09 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
08:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
08:08 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
08:08 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1003.eqiad.wmnet with OS buster |
[production] |
08:06 |
<marostegui> |
Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) T287244 |
[production] |
08:02 |
<elukey> |
ipmi mc reset cold for kafka-main1003, mgmt interface not reachable via ssh |
[production] |
07:57 |
<elukey> |
stop kafka* on kafka-main1003 as prep-step for reimage to buster |
[production] |