production SAL

851-900 of 10000 results (92ms)

2024-07-02 §
14:52	<cgoubert@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply	[production]
14:51	<jiji@cumin1002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet	[production]
14:51	<jiji@cumin1002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
14:51	<jiji@cumin1002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"	[production]
14:50	<cgoubert@deploy1002>	helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply	[production]
14:48	<jiji@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"	[production]
14:47	<jiji@cumin1002>	START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet	[production]
14:45	<jiji@cumin1002>	START - Cookbook sre.dns.netbox	[production]
14:38	<ayounsi@cumin1002>	START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm	[production]
14:37	<jiji@cumin1002>	START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet	[production]
14:28	<dcaro@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet	[production]
14:19	<dcaro@cumin1002>	START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet	[production]
14:15	<root@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye	[production]
14:12	<jiji@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom	[production]
14:12	<jiji@cumin1002>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom	[production]
14:11	<jiji@cumin1002>	END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom	[production]
14:11	<jiji@cumin1002>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom	[production]
14:07	<jforrester@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply	[production]
14:06	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns	[production]
14:06	<filippo@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply	[production]
14:05	<filippo@deploy1002>	helmfile [eqiad] START helmfile.d/services/page-analytics: apply	[production]
14:05	<jforrester@deploy1002>	helmfile [eqiad] START helmfile.d/services/wikifunctions: apply	[production]
14:05	<jforrester@deploy1002>	helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply	[production]
14:05	<filippo@deploy1002>	helmfile [codfw] DONE helmfile.d/services/page-analytics: apply	[production]
14:05	<filippo@deploy1002>	helmfile [codfw] START helmfile.d/services/page-analytics: apply	[production]
14:04	<bking@cumin2002>	END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad	[production]
14:04	<bking@cumin2002>	START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad	[production]
14:04	<sukhe@puppetmaster1001>	conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns	[production]
14:04	<jforrester@deploy1002>	helmfile [codfw] START helmfile.d/services/wikifunctions: apply	[production]
14:03	<jforrester@deploy1002>	helmfile [staging] DONE helmfile.d/services/wikifunctions: apply	[production]
14:03	<jforrester@deploy1002>	helmfile [staging] START helmfile.d/services/wikifunctions: apply	[production]
14:03	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org	[production]
14:02	<sukhe>	restart anycast-hc on dns6001	[production]
14:01	<sukhe@puppetmaster1001>	conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org	[production]
13:58	<effie>	decom old eqiad and codfw kubetcd hosts	[production]
13:46	<jforrester@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply	[production]
13:44	<jforrester@deploy1002>	helmfile [eqiad] START helmfile.d/services/wikifunctions: apply	[production]
13:44	<jforrester@deploy1002>	helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply	[production]
13:43	<jforrester@deploy1002>	helmfile [codfw] START helmfile.d/services/wikifunctions: apply	[production]
13:42	<jforrester@deploy1002>	helmfile [staging] DONE helmfile.d/services/wikifunctions: apply	[production]
13:42	<jforrester@deploy1002>	helmfile [staging] START helmfile.d/services/wikifunctions: apply	[production]
13:41	<brouberol@cumin1002>	START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes	[production]
13:39	<brouberol@cumin1002>	END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes	[production]
13:35	<cgoubert@cumin1002>	conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet\|wikikube-worker2031.codfw.wmnet\|wikikube-worker2032.codfw.wmnet\|wikikube-worker2033.codfw.wmnet\|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc	[production]
13:35	<claime>	Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074	[production]
13:31	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json	[production]
13:30	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance	[production]
13:30	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance	[production]
13:30	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json	[production]
13:30	<Lucas_WMDE>	UTC afternoon backport+config window done	[production]