201-250 of 10000 results (91ms)
2024-10-15 ยง
15:44 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Set db2209 with weight 0 T377164', diff saved to https://phabricator.wikimedia.org/P69990 and previous config saved to /var/cache/conftool/dbconfig/20241015-154228-ladsgroup.json [production]
15:43 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164 [production]
15:42 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164 [production]
15:42 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69989 and previous config saved to /var/cache/conftool/dbconfig/20241015-154027-ladsgroup.json [production]
15:41 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance [production]
15:40 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance [production]
15:40 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69988 and previous config saved to /var/cache/conftool/dbconfig/20241015-154002-ladsgroup.json [production]
15:27 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69987 and previous config saved to /var/cache/conftool/dbconfig/20241015-152749-arnaudb.json [production]
15:26 <akosiaris> run gnt-cluster verify-disks after ganeti1034 forceful reboot [production]
15:24 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69986 and previous config saved to /var/cache/conftool/dbconfig/20241015-152456-ladsgroup.json [production]
15:22 <volans> force-rebooting ganeti1034 stuck due to drbd traces via mgmt [production]
15:19 <akosiaris@cumin1002> END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet [production]
15:17 <akosiaris> drain ganeti1034 of VMs, hardware might be misbehaving [production]
15:16 <akosiaris@cumin1002> START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet [production]
15:12 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69985 and previous config saved to /var/cache/conftool/dbconfig/20241015-151243-arnaudb.json [production]
15:09 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69984 and previous config saved to /var/cache/conftool/dbconfig/20241015-150948-ladsgroup.json [production]
14:57 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69983 and previous config saved to /var/cache/conftool/dbconfig/20241015-145734-arnaudb.json [production]
14:56 <herron@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet [production]
14:56 <arnaudb@cumin1002> dbctl commit (dc=all): 'Depooling db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69982 and previous config saved to /var/cache/conftool/dbconfig/20241015-145517-arnaudb.json [production]
14:55 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance [production]
14:55 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance [production]
14:55 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69981 and previous config saved to /var/cache/conftool/dbconfig/20241015-145453-arnaudb.json [production]
14:54 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69980 and previous config saved to /var/cache/conftool/dbconfig/20241015-145441-ladsgroup.json [production]
14:48 <herron@cumin1002> START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet [production]
14:47 <herron@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet [production]
14:47 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69979 and previous config saved to /var/cache/conftool/dbconfig/20241015-144631-ladsgroup.json [production]
14:47 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance [production]
14:46 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance [production]
14:46 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69978 and previous config saved to /var/cache/conftool/dbconfig/20241015-144606-ladsgroup.json [production]
14:45 <jdrewniak@deploy2002> Synchronized portals: Wikimedia Portals Update: [[gerrit:1046698| Bumping portals to master (T128546)]] (duration: 02m 24s) [production]
14:43 <jdrewniak@deploy2002> Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:1046698| Bumping portals to master (T128546)]] (duration: 06m 46s) [production]
14:39 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69977 and previous config saved to /var/cache/conftool/dbconfig/20241015-143946-arnaudb.json [production]
14:38 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depooling db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P69976 and previous config saved to /var/cache/conftool/dbconfig/20241015-143803-ladsgroup.json [production]
14:38 <ladsgroup@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance [production]
14:38 <ladsgroup@cumin1002> START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance [production]
14:37 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69975 and previous config saved to /var/cache/conftool/dbconfig/20241015-143740-ladsgroup.json [production]
14:36 <herron@cumin1002> START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet [production]
14:35 <brouberol@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet [production]
14:33 <herron@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet [production]
14:31 <brouberol@cumin1002> START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet [production]
14:31 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69974 and previous config saved to /var/cache/conftool/dbconfig/20241015-143059-ladsgroup.json [production]
14:29 <kevinbazira@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
14:28 <herron@cumin1002> START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet [production]
14:28 <akosiaris@deploy2002> helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [production]
14:27 <akosiaris@deploy2002> helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [production]
14:27 <akosiaris@deploy2002> helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [production]
14:26 <akosiaris@deploy2002> helmfile [codfw] START helmfile.d/services/rest-gateway: apply [production]
14:24 <arnaudb@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69973 and previous config saved to /var/cache/conftool/dbconfig/20241015-142439-arnaudb.json [production]
14:24 <herron@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet [production]
14:24 <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1080279|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]] (duration: 33m 23s) [production]