production SAL

51-100 of 10000 results (83ms)

2024-07-23 §
18:42	<mutante>	puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949	[production]
18:40	<pt1979@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye	[production]
18:14	<dduvall@deploy1002>	rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960	[production]
18:13	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949	[production]
18:12	<aokoth@cumin1002>	END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet	[production]
18:11	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949	[production]
18:11	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa	[production]
18:10	<aokoth@cumin1002>	START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet	[production]
18:10	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa	[production]
18:08	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa	[production]
18:01	<aokoth@cumin1002>	END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet	[production]
18:01	<aokoth@cumin1002>	START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet	[production]
17:58	<swfrench-wmf>	sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949	[production]
17:51	<swfrench-wmf>	sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949	[production]
17:46	<logmsgbot>	nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s)	[production]
17:46	<logmsgbot>	nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided)	[production]
17:44	<swfrench-wmf>	sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949	[production]
17:41	<sukhe@cumin1002>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet	[production]
17:41	<sukhe@cumin1002>	START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet	[production]
17:40	<swfrench@cumin2002>	END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949)	[production]
17:37	<pt1979@cumin1002>	START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye	[production]
17:33	<swfrench@cumin2002>	START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949)	[production]
17:28	<swfrench-wmf>	run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949	[production]
17:17	<swfrench-wmf>	run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949	[production]
17:06	<swfrench-wmf>	ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949	[production]
17:04	<dancy@deploy1002>	sync-world aborted: testing (duration: 00m 51s)	[production]
17:03	<dancy@deploy1002>	Started scap sync-world: testing	[production]
17:02	<pt1979@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye	[production]
16:59	<jhathaway>	applying varnish change on cp4037, 1030591	[production]
16:58	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply	[production]
16:57	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/shellbox-video: apply	[production]
16:16	<pt1979@cumin1002>	START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye	[production]
16:14	<pt1979@cumin1002>	END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet	[production]
16:07	<brouberol@deploy1002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply	[production]
16:07	<brouberol@deploy1002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply	[production]
15:52	<pt1979@cumin1002>	START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet	[production]
15:48	<brouberol@deploy1002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply	[production]
15:47	<brouberol@deploy1002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
15:47	<brouberol@deploy1002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
15:24	<cgoubert@cumin1002>	conftool action : set/pooled=yes; selector: name=(kubernetes1025\|kubernetes1026\|kubernetes1052\|kubernetes1053\|kubernetes1054\|kubernetes1055\|kubernetes1056\|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998]	[production]
15:24	<Emperor>	moss-be1003 out of maintenance mode after network downtime T365998	[production]
15:22	<cgoubert@cumin1002>	conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc	[production]
15:22	<claime>	Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998	[production]
15:20	<andrewbogott>	find /srv/mediawiki/images/wikitech/archive -type f \| xargs delete on wikitech-static, drive is full of nonsense	[production]
15:07	<brennen@deploy1002>	Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s)	[production]
15:06	<brennen@deploy1002>	Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776	[production]
15:06	<brennen@deploy1002>	Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s)	[production]
15:05	<brennen@deploy1002>	Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op)	[production]
15:05	<brennen@deploy1002>	Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s)	[production]
15:03	<brennen@deploy1002>	Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776	[production]