production SAL

6301-6350 of 10000 results (81ms)

2023-01-26 §
12:42	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717	[production]
12:42	<sukhe@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be	[production]
12:42	<sukhe@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn	[production]
12:41	<sukhe>	depool cp3051.esams.wmnet for firmware update testing: T323717	[production]
12:41	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet	[production]
12:40	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet	[production]
12:29	<mvernon@cumin2002>	END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe	[production]
12:15	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet	[production]
12:10	<hnowlan@puppetmaster1001>	conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet	[production]
12:10	<mvernon@cumin2002>	START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe	[production]
12:03	<jbond>	enable profile::base::firewall::defs_from_etcd: true globally	[production]
11:56	<jbond@cumin1001>	END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors	[production]
11:56	<jbond@cumin1001>	START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors	[production]
11:49	<hnowlan@puppetmaster1001>	conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet	[production]
11:49	<hnowlan@puppetmaster1001>	conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet	[production]
11:48	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001	[production]
11:48	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:48	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"	[production]
11:46	<ayounsi@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"	[production]
11:44	<ayounsi@cumin1001>	START - Cookbook sre.dns.netbox	[production]
11:40	<ayounsi@cumin1001>	START - Cookbook sre.hosts.decommission for hosts flowspec1001	[production]
11:36	<cgoubert@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux	[production]
11:29	<jgiannelos@deploy1002>	helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync	[production]
11:29	<jgiannelos@deploy1002>	helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync	[production]
11:28	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet	[production]
11:08	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json	[production]
11:03	<hashar>	Restarted Apache 2 on gerrit.wikimedia.org	[production]
10:55	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/toolhub: apply	[production]
10:55	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
10:55	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"	[production]
10:54	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/services/toolhub: apply	[production]
10:54	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/services/toolhub: apply	[production]
10:53	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json	[production]
10:53	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/services/toolhub: apply	[production]
10:46	<cgoubert@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"	[production]
10:45	<moritzm>	installing postgresql-13 security updates	[production]
10:43	<cgoubert@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:42	<joal@deploy1002>	Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)	[production]
10:42	<joal@deploy1002>	Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)	[production]
10:41	<claime>	cgoubert@authdns1001:~$ sudo -i authdns-update	[production]
10:38	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json	[production]
10:34	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json	[production]
10:32	<joal@deploy1002>	Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)	[production]
10:31	<joal@deploy1002>	Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]	[production]
10:23	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json	[production]
10:21	<joal@deploy1002>	Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)	[production]
10:21	<joal@deploy1002>	Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]	[production]
10:19	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json	[production]
10:08	<jbond@cumin1001>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet	[production]
10:08	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet	[production]