2025-03-07
§
|
08:15 |
<elukey@deploy2002> |
helmfile [staging] DONE helmfile.d/services/changeprop: sync |
[production] |
08:15 |
<elukey@deploy2002> |
helmfile [staging] START helmfile.d/services/changeprop: sync |
[production] |
08:12 |
<moritzm> |
installing Linux 5.10.234 on Bullseye hosts (just the rollout of the new kernels, no immediate reboots involved) |
[production] |
08:07 |
<jmm@cumin2002> |
DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JJMC89 out of all services on: 2 hosts |
[production] |
07:51 |
<moritzm> |
installing emacs security updates |
[production] |
07:36 |
<hashar@deploy2002> |
Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2 (duration: 01m 23s) |
[production] |
07:35 |
<hashar@deploy2002> |
Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2 |
[production] |
07:31 |
<hashar> |
Upgrading Jenkins on contint1002 |
[production] |
03:15 |
<wmbot~bsadowski1@tools-bastion-13> |
Restarted StewardBot/SULWatcher because of an EventStreams issue |
[tools.stewardbots] |
01:06 |
<jclark@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1201.eqiad.wmnet with OS bullseye |
[production] |
00:41 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1208.eqiad.wmnet with OS bullseye |
[production] |
00:41 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1207.eqiad.wmnet with OS bullseye |
[production] |
00:41 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1206.eqiad.wmnet with OS bullseye |
[production] |
00:40 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1205.eqiad.wmnet with OS bullseye |
[production] |
00:40 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1204.eqiad.wmnet with OS bullseye |
[production] |
00:40 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1203.eqiad.wmnet with OS bullseye |
[production] |
00:40 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye |
[production] |
00:01 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1201.eqiad.wmnet with OS bullseye |
[production] |
2025-03-06
§
|
23:20 |
<joal> |
Force killing gobblin failing job to let next one with patched code run |
[analytics] |
23:19 |
<joal@deploy2002> |
Finished deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2 (duration: 01m 13s) |
[production] |
23:19 |
<joal> |
Deploying refinery onto an-launcher1002 to remove recentchange from gobblin |
[analytics] |
23:18 |
<joal@deploy2002> |
Started deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2 |
[production] |
23:03 |
<tgr@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]] (duration: 20m 55s) |
[production] |
22:56 |
<tgr@deploy2002> |
tgr: Continuing with sync |
[production] |
22:45 |
<tgr@deploy2002> |
tgr: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
22:42 |
<tgr@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]] |
[production] |
22:39 |
<toyofuku@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1124510|Enable Search AB test for en wiki]] (duration: 18m 27s) |
[production] |
22:33 |
<toyofuku@deploy2002> |
toyofuku, bwang: Continuing with sync |
[production] |
22:28 |
<wmbot~anticomposite@tools-bastion-13> |
Add temp debugging code to StewardBot for eventstreams issue |
[tools.stewardbots] |
22:26 |
<fceratto@cumin1002> |
END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet |
[production] |
22:23 |
<toyofuku@deploy2002> |
toyofuku, bwang: Backport for [[gerrit:1124510|Enable Search AB test for en wiki]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
22:21 |
<toyofuku@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1124510|Enable Search AB test for en wiki]] |
[production] |
22:13 |
<tgr@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1125232|Revert^2 "Fix nested refs with the same name but a different group"]] (duration: 12m 44s) |
[production] |
22:06 |
<tgr@deploy2002> |
tgr, ssastry: Continuing with sync |
[production] |
22:03 |
<tgr@deploy2002> |
tgr, ssastry: Backport for [[gerrit:1125232|Revert^2 "Fix nested refs with the same name but a different group"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
22:00 |
<tgr@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1125232|Revert^2 "Fix nested refs with the same name but a different group"]] |
[production] |
21:55 |
<tgr@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1124895|Remove unused $wgDiscussionToolsABTest]], [[gerrit:1124896|Remove unused $wgOATHAuthMultipleDevicesMigrationStage]], [[gerrit:1122711|Deduplicate JsonConfig config]] (duration: 15m 00s) |
[production] |
21:54 |
<otto@deploy2002> |
Finished deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue' (duration: 01m 55s) |
[production] |
21:53 |
<otto@deploy2002> |
Started deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue' |
[production] |
21:49 |
<tgr@deploy2002> |
matmarex, tgr: Continuing with sync |
[production] |
21:43 |
<tgr@deploy2002> |
matmarex, tgr: Backport for [[gerrit:1124895|Remove unused $wgDiscussionToolsABTest]], [[gerrit:1124896|Remove unused $wgOATHAuthMultipleDevicesMigrationStage]], [[gerrit:1122711|Deduplicate JsonConfig config]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
21:40 |
<tgr@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1124895|Remove unused $wgDiscussionToolsABTest]], [[gerrit:1124896|Remove unused $wgOATHAuthMultipleDevicesMigrationStage]], [[gerrit:1122711|Deduplicate JsonConfig config]] |
[production] |
21:32 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904 |
[production] |
21:32 |
<bking@cumin2002> |
START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904 |
[production] |
19:49 |
<jgiannelos@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/changeprop: apply |
[production] |
19:48 |
<jgiannelos@deploy2002> |
helmfile [eqiad] START helmfile.d/services/changeprop: apply |
[production] |
19:19 |
<wmbot~anticomposite@tools-bastion-13> |
stewardbots/StewardBot/manage.sh restart # disconnected |
[tools.stewardbots] |
19:11 |
<ebernhardson> |
T379002 start reindex of cirrus cebwiki_content index in codfw |
[production] |
19:10 |
<btullis@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-presto1014.eqiad.wmnet |
[production] |
19:09 |
<ebernhardson> |
T379002 start reindex of cirrus cebwiki_content index in eqiad |
[production] |