601-650 of 10000 results (105ms)
2025-01-06 ยง
10:13 <cgoubert@deploy2002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
10:13 <cgoubert@deploy2002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
10:13 <jelto@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2050.codfw.wmnet with reason: host reimage [production]
10:12 <cgoubert@deploy2002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
10:12 <cgoubert@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
10:11 <cgoubert@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
10:11 <cgoubert@deploy2002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
10:10 <cgoubert@deploy2002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
10:10 <cgoubert@deploy2002> helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
10:09 <cgoubert@deploy2002> helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
10:09 <cgoubert@deploy2002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
10:09 <cgoubert@deploy2002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]
10:08 <cgoubert@deploy2002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
10:08 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-worker1242.eqiad.wmnet with OS bookworm [production]
10:08 <cgoubert@deploy2002> helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [production]
10:07 <cgoubert@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
10:07 <cgoubert@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
10:07 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db1167 (T371742)', diff saved to https://phabricator.wikimedia.org/P71801 and previous config saved to /var/cache/conftool/dbconfig/20250106-100706-ladsgroup.json [production]
10:07 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [production]
10:06 <claime> Deploying admin_ng external services changes on all kubernetes clusters [production]
10:06 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [production]
10:06 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance [production]
10:06 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance [production]
10:06 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1241.eqiad.wmnet with OS bookworm [production]
09:55 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2050 [production]
09:55 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2049 [production]
09:55 <jelto@cumin1002> START - Cookbook sre.hosts.move-vlan for host wikikube-worker2050 [production]
09:55 <jelto@cumin1002> START - Cookbook sre.hosts.move-vlan for host wikikube-worker2049 [production]
09:55 <jelto@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bookworm [production]
09:55 <jelto@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-worker2049.codfw.wmnet with OS bookworm [production]
09:54 <jelto@cumin1002> END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2049-2050].codfw.wmnet [production]
09:54 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage [production]
09:53 <jelto@cumin1002> START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2049-2050].codfw.wmnet [production]
09:52 <jelto@cumin1002> END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2052.codfw.wmnet [production]
09:52 <jelto@cumin1002> START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2052.codfw.wmnet [production]
09:52 <jelto@cumin1002> END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2051.codfw.wmnet [production]
09:52 <jelto@cumin1002> START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2051.codfw.wmnet [production]
09:51 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2052.codfw.wmnet with OS bookworm [production]
09:50 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage [production]
09:47 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage [production]
09:46 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2051.codfw.wmnet with OS bookworm [production]
09:43 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage [production]
09:41 <dcausse> repooling wdqs1012 [production]
09:31 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage [production]
09:29 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-worker1246.eqiad.wmnet with OS bookworm [production]
09:27 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2051.codfw.wmnet with reason: host reimage [production]
09:26 <dcausse> depooling wdqs1012 (high lag, forgot to keep it depooled after restarting blazegraph) [production]
09:26 <marostegui> Reboot db2160 for kernel upgrade T376905 [production]
09:25 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2160.codfw.wmnet with reason: upgrade kernel [production]
09:25 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 5:00:00 on db2160.codfw.wmnet with reason: upgrade kernel [production]