551-600 of 10000 results (145ms)
2025-08-26 ยง
13:42 <fceratto@cumin1002> dbctl commit (dc=all): 'Depooling db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81763 and previous config saved to /var/cache/conftool/dbconfig/20250826-134201-fceratto.json [production]
13:41 <fceratto@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance [production]
13:40 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe[1017-1020].eqiad.wmnet [production]
13:40 <mvernon@cumin2002> START - Cookbook sre.hosts.remove-downtime for ms-fe[1017-1020].eqiad.wmnet [production]
13:35 <lucaswerkmeister-wmde@deploy1003> mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki # T313900 (dry run) [production]
13:35 <stevemunene@cumin1003> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. [production]
13:34 <mvernon@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-fe[1017-1020].eqiad.wmnet with reason: reboot before bringing into service [production]
13:33 <lucaswerkmeister-wmde@deploy1003> Finished scap sync-world: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] (duration: 12m 54s) [production]
13:28 <jhancock@cumin1003> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frmx2002 [production]
13:28 <jhancock@cumin1003> START - Cookbook sre.network.configure-switch-interfaces for host frmx2002 [production]
13:28 <lucaswerkmeister-wmde@deploy1003> matmarex, lucaswerkmeister-wmde: Continuing with sync [production]
13:28 <jhancock@cumin1003> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2039 [production]
13:28 <jhancock@cumin1003> START - Cookbook sre.network.configure-switch-interfaces for host es2039 [production]
13:27 <jhancock@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
13:26 <lucaswerkmeister-wmde@deploy1003> matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] synced to the testservers (see https://wikitech.wikim [production]
13:26 <jmm@deploy1003> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
13:24 <jhancock@cumin1003> START - Cookbook sre.dns.netbox [production]
13:24 <jhancock@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
13:21 <jhancock@cumin1003> START - Cookbook sre.dns.netbox [production]
13:20 <lucaswerkmeister-wmde@deploy1003> Started scap sync-world: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] [production]
13:20 <jmm@deploy1003> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
13:11 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host cloudcephosd1052.eqiad.wmnet with OS bullseye [production]
13:09 <jclark@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
13:06 <jmm@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
13:02 <jmm@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
12:57 <jmm@deploy1003> helmfile [staging] DONE helmfile.d/services/thumbor: apply [production]
12:56 <jmm@deploy1003> helmfile [staging] START helmfile.d/services/thumbor: apply [production]
12:55 <jclark@cumin1002> START - Cookbook sre.hosts.provision for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
12:54 <jclark@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
12:54 <jclark@cumin1002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052 - jclark@cumin1002" [production]
12:54 <jclark@cumin1002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052 - jclark@cumin1002" [production]
12:50 <jclark@cumin1002> START - Cookbook sre.dns.netbox [production]
12:48 <stevemunene@cumin1003> START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. [production]
12:16 <dbrant@deploy1003> helmfile [codfw] DONE helmfile.d/services/mobileapps: apply [production]
12:15 <dbrant@deploy1003> helmfile [codfw] START helmfile.d/services/mobileapps: apply [production]
12:15 <dbrant@deploy1003> helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply [production]
12:14 <dbrant@deploy1003> helmfile [eqiad] START helmfile.d/services/mobileapps: apply [production]
12:12 <dbrant@deploy1003> helmfile [staging] DONE helmfile.d/services/mobileapps: apply [production]
12:11 <dbrant@deploy1003> helmfile [staging] START helmfile.d/services/mobileapps: apply [production]
11:55 <Daimona> Running queries from T402239#11118710 in x1.wikishared to fix broken event addresses (again) [production]
11:25 <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es2039.codfw.wmnet with reason: Glow up (T399927) [production]
11:25 <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1039.eqiad.wmnet with reason: Glow up (T399927) [production]
11:22 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1244 gradually with 4 steps - Work done [production]
11:19 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depool es2039 T402912', diff saved to https://phabricator.wikimedia.org/P81760 and previous config saved to /var/cache/conftool/dbconfig/20250826-111927-ladsgroup.json [production]
11:16 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Promote es2038 to es7 primary T402912', diff saved to https://phabricator.wikimedia.org/P81759 and previous config saved to /var/cache/conftool/dbconfig/20250826-111630-ladsgroup.json [production]
11:14 <Amir1> Starting es7 codfw failover from es2039 to es2038 - T402912 [production]
11:10 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Set es2038 with weight 0 T402912', diff saved to https://phabricator.wikimedia.org/P81758 and previous config saved to /var/cache/conftool/dbconfig/20250826-111015-ladsgroup.json [production]
11:09 <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Primary switchover es7 T402912 [production]
10:37 <ladsgroup@cumin1002> START - Cookbook sre.mysql.pool db1244 gradually with 4 steps - Work done [production]
10:31 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install6002.wikimedia.org [production]