3801-3850 of 10000 results (67ms)
2022-09-06 ยง
20:03 <inflatador> 'bking@cumin1001 disabling puppet on elastic codfw hosts T313431' [production]
19:56 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33974 and previous config saved to /var/cache/conftool/dbconfig/20220906-195642-ladsgroup.json [production]
19:41 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312863)', diff saved to https://phabricator.wikimedia.org/P33973 and previous config saved to /var/cache/conftool/dbconfig/20220906-194135-ladsgroup.json [production]
19:24 <milimetric@deploy1002> Started deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13] [production]
18:45 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2124 (T314041)', diff saved to https://phabricator.wikimedia.org/P33972 and previous config saved to /var/cache/conftool/dbconfig/20220906-184515-ladsgroup.json [production]
18:45 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance [production]
18:44 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance [production]
18:25 <cwhite> reduce codfw replicas 2 to 1 for logstash-(webrequest|k8s) partitions. Make space for failed logstash2027 - T316996 [production]
17:50 <root@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
17:48 <root@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
17:23 <moritzm> installing dpkg bugfix updates from bullseye point release [production]
17:18 <pt1979@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging1004'] [production]
17:16 <krinkle@deploy1002> Synchronized php-1.39.0-wmf.27/resources/src/: I0516527d5cc0 (duration: 03m 50s) [production]
17:15 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
17:14 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
17:14 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
17:14 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
17:11 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
17:06 <krinkle@deploy1002> Synchronized wmf-config/: (no justification provided) (duration: 03m 50s) [production]
17:02 <pt1979@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging1004'] [production]
17:00 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
17:00 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance [production]
16:59 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2158 (T314041)', diff saved to https://phabricator.wikimedia.org/P33969 and previous config saved to /var/cache/conftool/dbconfig/20220906-165958-ladsgroup.json [production]
16:58 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
16:57 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
16:57 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
16:56 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
16:55 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:51 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
16:47 <jelto@cumin1001> END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner [production]
16:45 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:44 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:44 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:42 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:36 <pt1979@cumin2002> END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-logging1004'] [production]
16:25 <btullis@deploy1002> helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main [production]
16:24 <btullis@deploy1002> helmfile [eqiad] START helmfile.d/services/datahub: apply on main [production]
16:23 <btullis@deploy1002> helmfile [codfw] DONE helmfile.d/services/datahub: sync on main [production]
16:22 <btullis@deploy1002> helmfile [codfw] START helmfile.d/services/datahub: apply on main [production]
16:22 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
16:20 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
16:18 <pt1979@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004'] [production]
16:12 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) [production]
16:12 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1004.mgmt.eqiad.wmnet with reboot policy FORCED [production]
16:01 <jelto@cumin1001> START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner [production]
15:50 <marostegui@cumin1001> dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33968 and previous config saved to /var/cache/conftool/dbconfig/20220906-154959-root.json [production]
15:48 <ayounsi@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]