2101-2150 of 10000 results (69ms)
2022-09-06 ยง
17:06 <krinkle@deploy1002> Synchronized wmf-config/: (no justification provided) (duration: 03m 50s) [production]
17:02 <pt1979@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging1004'] [production]
17:00 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
17:00 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
16:59 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2158 (T314041)', diff saved to https://phabricator.wikimedia.org/P33969 and previous config saved to /var/cache/conftool/dbconfig/20220906-165958-ladsgroup.json [production]
16:58 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
16:57 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
16:57 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
16:56 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
16:55 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:51 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
16:47 <jelto@cumin1001> END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner [production]
16:45 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:44 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:44 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:42 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:36 <pt1979@cumin2002> END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:25 <btullis@deploy1002> helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main [production]
16:24 <btullis@deploy1002> helmfile [eqiad] START helmfile.d/services/datahub: apply on main [production]
16:23 <btullis@deploy1002> helmfile [codfw] DONE helmfile.d/services/datahub: sync on main [production]
16:22 <btullis@deploy1002> helmfile [codfw] START helmfile.d/services/datahub: apply on main [production]
16:22 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
16:20 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
16:18 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:12 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) [production]
16:12 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1004.mgmt.eqiad.wmnet with reboot policy FORCED [production]
16:01 <jelto@cumin1001> START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner [production]
15:50 <marostegui@cumin1001> dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33968 and previous config saved to /var/cache/conftool/dbconfig/20220906-154959-root.json [production]
15:48 <ayounsi@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
15:44 <root@cumin1001> END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99) [production]
15:43 <root@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
15:43 <root@cumin1001> END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99) [production]
15:43 <root@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
15:34 <marostegui@cumin1001> dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33967 and previous config saved to /var/cache/conftool/dbconfig/20220906-153454-root.json [production]
15:21 <jelto@cumin1001> END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner [production]
15:20 <jelto@cumin1001> START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner [production]
15:19 <marostegui@cumin1001> dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33966 and previous config saved to /var/cache/conftool/dbconfig/20220906-151950-root.json [production]
15:15 <claime> Set wtp10[41-43].eqiad.wmnet inactive pending decommission T317025 [production]
15:14 <cgoubert@puppetmaster1001> conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1043.eqiad.wmnet [production]
15:14 <cgoubert@puppetmaster1001> conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1042.eqiad.wmnet [production]
15:14 <cgoubert@puppetmaster1001> conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1041.eqiad.wmnet [production]
15:12 <cgoubert@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1041-1043].eqiad.wmnet with reason: Downtiming replaced wtp servers [production]
15:12 <cgoubert@cumin1001> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1041-1043].eqiad.wmnet with reason: Downtiming replaced wtp servers [production]
15:09 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2158 (T314041)', diff saved to https://phabricator.wikimedia.org/P33965 and previous config saved to /var/cache/conftool/dbconfig/20220906-150953-ladsgroup.json [production]
15:09 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [production]
15:09 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [production]
15:09 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance [production]