151-200 of 10000 results (40ms)
2022-02-24 ยง
20:59 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye [production]
20:58 <taavi@deploy1002> Started deploy [horizon/deploy@9d02cd6]: (no justification provided) [production]
20:58 <pt1979@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage [production]
20:51 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye [production]
20:14 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2084.codfw.wmnet with OS bullseye [production]
20:10 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2083.codfw.wmnet with OS bullseye [production]
20:04 <ryankemper> T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 3 'wcqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `wcqs` [production]
20:02 <ryankemper> T302526 Depooled `wcqs1001`, ran puppet agent, and restarted `wcqs-blazegraph`. Service came up healthy, proceeding to rest of wcqs fleet [production]
19:57 <ryankemper> T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `deploy_window` [production]
19:55 <ryankemper> T302526 Depooled canary `wdqs1003`, ran puppet agent, and restarted `wdqs-blazegraph`. Tests look good, proceeding to rest of wdqs fleet [production]
19:48 <ryankemper> T302526 (Forgot to merge patch first, take two) [production]
19:48 <ryankemper> T302526 Running puppet on wdqs canary: `ryankemper@wdqs1003:~$ sudo enable-puppet "query_service: Simply jvm arg handling - T302526" && sudo run-puppet-agent` [production]
19:46 <ryankemper> T302526 Disabling puppet across entire query service (wdqs & wcqs) fleet for merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/761080: `ryankemper@cumin1001:~$ sudo -E cumin 'w*qs*' 'disable-puppet "query_service: Simply jvm arg handling - T302526"'` [production]
19:06 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.23 refs T300199 [production]
19:00 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye [production]
18:56 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye [production]
18:55 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:53 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2082.codfw.wmnet with OS bullseye [production]
18:52 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:51 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:45 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:43 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:43 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage [production]
18:39 <pt1979@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage [production]
18:27 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:22 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2082.codfw.wmnet with OS bullseye [production]
18:21 <kormat@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21508 and previous config saved to /var/cache/conftool/dbconfig/20220224-182102-kormat.json [production]
18:20 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:13 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2081.codfw.wmnet with OS bullseye [production]
18:05 <kormat@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21506 and previous config saved to /var/cache/conftool/dbconfig/20220224-180557-kormat.json [production]
18:04 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:03 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage [production]
18:02 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED [production]
18:02 <kevinbazira@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
18:01 <kevinbazira@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
18:01 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
18:00 <pt1979@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage [production]
18:00 <kevinbazira@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
17:59 <kevinbazira@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
17:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
17:50 <kormat@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21504 and previous config saved to /var/cache/conftool/dbconfig/20220224-175052-kormat.json [production]
17:46 <pt1979@cumin2002> START - Cookbook sre.hosts.provision for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED [production]
17:45 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
17:44 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
17:44 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
17:44 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[1039,1043].eqiad.wmnet [production]
17:43 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
17:43 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2081.codfw.wmnet with OS bullseye [production]
17:40 <elukey> `truncate -s 1g /var/log/auth.log.1` on krb1001 to free space on the root partition [production]
17:38 <elukey> `truncate -s 1g /var/log/auth.log` on krb1001 to free space on the root partition [production]