production SAL

2201-2250 of 10000 results (74ms)

2022-10-10 §
10:13	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
10:04	<vgutierrez>	rolling upgrade to HAProxy 2.4.19 on both text and upload caching clusters	[production]
09:44	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A	[production]
09:43	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A	[production]
09:42	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet	[production]
09:35	<claime>	Imported helm3 3.9.4-1 to buster-wikimedia and bullseye-wikimedia	[production]
09:33	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet	[production]
09:33	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35384 and previous config saved to /var/cache/conftool/dbconfig/20221010-093334-ladsgroup.json	[production]
09:33	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance	[production]
09:33	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance	[production]
09:30	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35383 and previous config saved to /var/cache/conftool/dbconfig/20221010-093041-ladsgroup.json	[production]
09:30	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance	[production]
09:30	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance	[production]
09:26	<vgutierrez>	partitioning the ATS cache in cp1089, cp1090, cp2041, cp2042, cp3064, cp3065, cp4034, cp4036, cp5014, cp5016, cp6007, cp6015 - T317748	[production]
08:28	<Emperor>	set thanos ring replicas to 3.68 T311690	[production]
08:23	<jynus>	online resizefs of backup1003 bacula partition	[production]
08:13	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A	[production]
08:11	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A	[production]
08:09	<jynus>	online resizefs of backup2003 bacula partition	[production]
08:05	<jynus>	restarting db2100:s7 to apply new buffer pool config	[production]
07:52	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet	[production]
07:51	<jayme>	importes kubernetes 1.23.12 to component/kubernetes123 for buster-wikimedia, bullseye-wikimedia - T307943	[production]
07:45	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
07:45	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
07:43	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .	[production]
07:43	<godog>	bounce thanos-compact on thanos-fe2001	[production]
07:43	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet	[production]
07:39	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .	[production]
07:37	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .	[production]
07:35	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .	[production]
07:34	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .	[production]
07:31	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .	[production]
07:26	<elukey>	kill hanging process for user bmansurov on deploy1002 to allow proper user cleanup	[production]
06:58	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 1211 hosts	[production]
06:58	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 1211 hosts	[production]
06:56	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 797 hosts	[production]
06:56	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 797 hosts	[production]
06:06	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 397715	[production]
06:06	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 397715	[production]
2022-10-08 §
06:56	<hashar>	Restarting Gerrit to fix up replicaton to GitHub - T320305	[production]
2022-10-07 §
21:29	<dzahn@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: debugging	[production]
21:28	<dzahn@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: debugging	[production]
19:46	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ganeti4004.ulsfo.wmnet	[production]
19:46	<sukhe@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:42	<sukhe@cumin2002>	START - Cookbook sre.dns.netbox	[production]
19:37	<sukhe@cumin2002>	START - Cookbook sre.hosts.decommission for hosts ganeti4004.ulsfo.wmnet	[production]
19:07	<sukhe>	decommission ganeti4004.ulsfo.wmnet: T317249	[production]
19:05	<sukhe>	sudo gnt-node remove ganeti4004.ulsfo.wmnet T317249	[production]
17:51	<ryankemper>	[Elastic] Updated list of cross-cluster remote seeds for all eqiad/codfw elastic clusters; should resolve `ElasticSearch setting check` alerts	[production]
17:20	<sukhe>	sudo gnt-node evacuate -s ganeti4004.ulsfo.wmnet	[production]