1-50 of 10000 results (101ms)
2025-09-08 ยง
23:59 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82798 and previous config saved to /var/cache/conftool/dbconfig/20250908-235927-ladsgroup.json [production]
23:53 <ladsgroup@cumin1003> START - Cookbook sre.mysql.pool db1223 gradually with 4 steps - Maint over [production]
23:45 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P82796 and previous config saved to /var/cache/conftool/dbconfig/20250908-234506-ladsgroup.json [production]
23:44 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82795 and previous config saved to /var/cache/conftool/dbconfig/20250908-234419-ladsgroup.json [production]
23:33 <ladsgroup@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1223.eqiad.wmnet with reason: Upgrade to 10.11 [production]
23:31 <rzl> helmfile -e eqiad -i apply --set mesh.image_name=envoy-future --set mesh.image_version=1.29.12-1 --context=5 # T403663 [production]
23:30 <rzl@deploy1003> helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply [production]
23:30 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Upgrade db1223 to MariaDB 10.11 (T399548)', diff saved to https://phabricator.wikimedia.org/P82794 and previous config saved to /var/cache/conftool/dbconfig/20250908-233042-ladsgroup.json [production]
23:29 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1235 (T402925)', diff saved to https://phabricator.wikimedia.org/P82793 and previous config saved to /var/cache/conftool/dbconfig/20250908-232958-ladsgroup.json [production]
23:29 <ladsgroup@cumin1003> END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1223 gradually with 4 steps - Maint over [production]
23:29 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2176 (T402925)', diff saved to https://phabricator.wikimedia.org/P82791 and previous config saved to /var/cache/conftool/dbconfig/20250908-232912-ladsgroup.json [production]
23:28 <rzl@deploy1003> helmfile [eqiad] START helmfile.d/services/mw-debug: apply [production]
23:21 <jdlrobson@deploy1003> Finished scap sync-world: Backport for [[gerrit:1182944|Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled]], [[gerrit:1186044|Temporarily use production for summary endpoint (T400694)]] (duration: 16m 06s) [production]
23:16 <jdlrobson@deploy1003> jdlrobson: Continuing with sync [production]
23:11 <jdlrobson@deploy1003> jdlrobson: Backport for [[gerrit:1182944|Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled]], [[gerrit:1186044|Temporarily use production for summary endpoint (T400694)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [production]
23:10 <ladsgroup@cumin1003> START - Cookbook sre.mysql.pool db1223 gradually with 4 steps - Maint over [production]
23:08 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1223.eqiad.wmnet [production]
23:08 <eileen> civicrm upgraded from c7ebd726 to 1ec5de94 [production]
23:05 <jdlrobson@deploy1003> Started scap sync-world: Backport for [[gerrit:1182944|Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled]], [[gerrit:1186044|Temporarily use production for summary endpoint (T400694)]] [production]
23:02 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1223 - Upgrading db1223.eqiad.wmnet [production]
23:02 <ladsgroup@cumin1002> START - Cookbook sre.mysql.depool db1223 - Upgrading db1223.eqiad.wmnet [production]
23:02 <ladsgroup@cumin1002> START - Cookbook sre.mysql.upgrade for db1223.eqiad.wmnet [production]
22:56 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depool db1223 T404025', diff saved to https://phabricator.wikimedia.org/P82789 and previous config saved to /var/cache/conftool/dbconfig/20250908-225603-ladsgroup.json [production]
22:54 <ladsgroup@dns1004> END - running authdns-update [production]
22:53 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Depooling db2176 (T402925)', diff saved to https://phabricator.wikimedia.org/P82788 and previous config saved to /var/cache/conftool/dbconfig/20250908-225313-ladsgroup.json [production]
22:53 <ladsgroup@dns1004> START - running authdns-update [production]
22:53 <ladsgroup@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance [production]
22:52 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82787 and previous config saved to /var/cache/conftool/dbconfig/20250908-225250-ladsgroup.json [production]
22:50 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Promote db1189 to s3 primary and set section read-write T404025', diff saved to https://phabricator.wikimedia.org/P82786 and previous config saved to /var/cache/conftool/dbconfig/20250908-225054-ladsgroup.json [production]
22:49 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T404025', diff saved to https://phabricator.wikimedia.org/P82785 and previous config saved to /var/cache/conftool/dbconfig/20250908-224914-ladsgroup.json [production]
22:48 <Amir1> Starting s3 eqiad failover from db1223 to db1189 - T404025 [production]
22:43 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Set db1189 with weight 0 T404025', diff saved to https://phabricator.wikimedia.org/P82784 and previous config saved to /var/cache/conftool/dbconfig/20250908-224330-ladsgroup.json [production]
22:42 <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T404025 [production]
22:37 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82783 and previous config saved to /var/cache/conftool/dbconfig/20250908-223742-ladsgroup.json [production]
22:35 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Depooling db1235 (T402925)', diff saved to https://phabricator.wikimedia.org/P82782 and previous config saved to /var/cache/conftool/dbconfig/20250908-223528-ladsgroup.json [production]
22:35 <ladsgroup@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1235.eqiad.wmnet with reason: Maintenance [production]
22:35 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1234 (T402925)', diff saved to https://phabricator.wikimedia.org/P82781 and previous config saved to /var/cache/conftool/dbconfig/20250908-223504-ladsgroup.json [production]
22:23 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
22:22 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82780 and previous config saved to /var/cache/conftool/dbconfig/20250908-222235-ladsgroup.json [production]
22:19 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P82779 and previous config saved to /var/cache/conftool/dbconfig/20250908-221956-ladsgroup.json [production]
22:07 <andrew@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye [production]
22:07 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82778 and previous config saved to /var/cache/conftool/dbconfig/20250908-220728-ladsgroup.json [production]
22:04 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P82777 and previous config saved to /var/cache/conftool/dbconfig/20250908-220449-ladsgroup.json [production]
21:58 <jhancock@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
21:57 <jhancock@cumin1002> START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
21:52 <jhancock@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
21:51 <jhancock@cumin1002> START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
21:49 <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repooling after maintenance db1234 (T402925)', diff saved to https://phabricator.wikimedia.org/P82776 and previous config saved to /var/cache/conftool/dbconfig/20250908-214941-ladsgroup.json [production]
21:46 <jhancock@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
21:45 <jhancock@cumin1002> START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]