1151-1200 of 10000 results (56ms)
2022-06-14 ยง
09:08 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet [production]
09:08 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29719 and previous config saved to /var/cache/conftool/dbconfig/20220614-090817-marostegui.json [production]
09:08 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet [production]
09:05 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet [production]
09:04 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet [production]
09:01 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet [production]
09:00 <mvernon@cumin2002> START - Cookbook sre.hosts.reimage for host ms-be1058.eqiad.wmnet with OS bullseye [production]
09:00 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet [production]
08:59 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet [production]
08:59 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet [production]
08:58 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet [production]
08:56 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet [production]
08:56 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1001.eqiad.wmnet [production]
08:56 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org [production]
08:56 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS buster [production]
08:53 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P29718 and previous config saved to /var/cache/conftool/dbconfig/20220614-085312-marostegui.json [production]
08:53 <joal@deploy1002> Finished deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] (duration: 07m 27s) [production]
08:51 <btullis@cumin1001> END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons. [production]
08:49 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet [production]
08:48 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet [production]
08:48 <btullis@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. [production]
08:47 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org [production]
08:47 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet [production]
08:46 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet [production]
08:45 <joal@deploy1002> Started deploy [analytics/refinery@f146a63] (hadoop-test): Regular analytics weekly train - TEST [analytics/refinery@f146a63] [production]
08:45 <joal@deploy1002> Finished deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] (duration: 00m 08s) [production]
08:44 <joal@deploy1002> Started deploy [analytics/refinery@f146a63] (thin): Regular analytics weekly train - THIN [analytics/refinery@f146a63] [production]
08:44 <joal@deploy1002> Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] (duration: 04m 45s) [production]
08:39 <joal@deploy1002> Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train - Second [analytics/refinery@f146a63] [production]
08:39 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet [production]
08:38 <godog> reboot centrallog2002 - T310483 [production]
08:38 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1179 (T310011)', diff saved to https://phabricator.wikimedia.org/P29717 and previous config saved to /var/cache/conftool/dbconfig/20220614-083807-marostegui.json [production]
08:28 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1179 (T310011)', diff saved to https://phabricator.wikimedia.org/P29716 and previous config saved to /var/cache/conftool/dbconfig/20220614-082855-marostegui.json [production]
08:28 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance [production]
08:28 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance [production]
08:28 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1166 (T310011)', diff saved to https://phabricator.wikimedia.org/P29715 and previous config saved to /var/cache/conftool/dbconfig/20220614-082847-marostegui.json [production]
08:23 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage [production]
08:20 <marostegui> dbmaint s6@eqiad T298560 [production]
08:18 <mvernon@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage [production]
08:16 <marostegui> dbmaint s6@eqiad T309311 [production]
08:16 <joal@deploy1002> Finished deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] (duration: 31m 09s) [production]
08:13 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29714 and previous config saved to /var/cache/conftool/dbconfig/20220614-081342-marostegui.json [production]
08:02 <mvernon@cumin2002> START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS buster [production]
07:58 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P29713 and previous config saved to /var/cache/conftool/dbconfig/20220614-075837-marostegui.json [production]
07:45 <joal@deploy1002> Started deploy [analytics/refinery@f146a63]: Regular analytics weekly train [analytics/refinery@f146a63] [production]
07:43 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1166 (T310011)', diff saved to https://phabricator.wikimedia.org/P29712 and previous config saved to /var/cache/conftool/dbconfig/20220614-074331-marostegui.json [production]
07:33 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1166 (T310011)', diff saved to https://phabricator.wikimedia.org/P29711 and previous config saved to /var/cache/conftool/dbconfig/20220614-073322-marostegui.json [production]
07:33 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance [production]
07:33 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance [production]
07:25 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]