production SAL

5001-5050 of 10000 results (54ms)

2022-02-24 §
19:55	<ryankemper>	T302526 Depooled canary `wdqs1003`, ran puppet agent, and restarted `wdqs-blazegraph`. Tests look good, proceeding to rest of wdqs fleet	[production]
19:48	<ryankemper>	T302526 (Forgot to merge patch first, take two)	[production]
19:48	<ryankemper>	T302526 Running puppet on wdqs canary: `ryankemper@wdqs1003:~$ sudo enable-puppet "query_service: Simply jvm arg handling - T302526" && sudo run-puppet-agent`	[production]
19:46	<ryankemper>	T302526 Disabling puppet across entire query service (wdqs & wcqs) fleet for merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/761080: `ryankemper@cumin1001:~$ sudo -E cumin 'wqs' 'disable-puppet "query_service: Simply jvm arg handling - T302526"'`	[production]
19:06	<dduvall@deploy1002>	rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.23 refs T300199	[production]
19:00	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye	[production]
18:56	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye	[production]
18:55	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:53	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2082.codfw.wmnet with OS bullseye	[production]
18:52	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:51	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:45	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:43	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:43	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage	[production]
18:39	<pt1979@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage	[production]
18:27	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:22	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2082.codfw.wmnet with OS bullseye	[production]
18:21	<kormat@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21508 and previous config saved to /var/cache/conftool/dbconfig/20220224-182102-kormat.json	[production]
18:20	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:13	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2081.codfw.wmnet with OS bullseye	[production]
18:05	<kormat@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21506 and previous config saved to /var/cache/conftool/dbconfig/20220224-180557-kormat.json	[production]
18:04	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:03	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage	[production]
18:02	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED	[production]
18:02	<kevinbazira@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .	[production]
18:01	<kevinbazira@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .	[production]
18:01	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
18:00	<pt1979@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage	[production]
18:00	<kevinbazira@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
17:59	<kevinbazira@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
17:50	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
17:50	<kormat@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21504 and previous config saved to /var/cache/conftool/dbconfig/20220224-175052-kormat.json	[production]
17:46	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED	[production]
17:45	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
17:44	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
17:44	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
17:44	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[1039,1043].eqiad.wmnet	[production]
17:43	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
17:43	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2081.codfw.wmnet with OS bullseye	[production]
17:40	<elukey>	`truncate -s 1g /var/log/auth.log.1` on krb1001 to free space on the root partition	[production]
17:38	<elukey>	`truncate -s 1g /var/log/auth.log` on krb1001 to free space on the root partition	[production]
17:35	<kormat@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21503 and previous config saved to /var/cache/conftool/dbconfig/20220224-173548-kormat.json	[production]
17:33	<kormat@cumin1001>	dbctl commit (dc=all): 'Depooling db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21502 and previous config saved to /var/cache/conftool/dbconfig/20220224-173307-kormat.json	[production]
17:33	<kormat@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance	[production]
17:33	<kormat@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance	[production]
17:33	<kormat@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300774)', diff saved to https://phabricator.wikimedia.org/P21501 and previous config saved to /var/cache/conftool/dbconfig/20220224-173259-kormat.json	[production]
17:32	<krinkle@deploy1002>	Synchronized wmf-config/: Ia61fea4d0dcf86d51547d3132093a336ab3f2e9f (duration: 00m 52s)	[production]
17:30	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2080.codfw.wmnet with OS bullseye	[production]
17:22	<ryankemper@cumin1001>	START - Cookbook sre.hosts.decommission for hosts elastic[1039,1043].eqiad.wmnet	[production]
17:20	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2080.codfw.wmnet with reason: host reimage	[production]