production SAL

5701-5750 of 10000 results (94ms)

2023-12-18 §
11:38	<fabfur@cumin1002>	START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye	[production]
11:37	<fabfur@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye	[production]
11:36	<fabfur@cumin1002>	START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye	[production]
11:36	<fabfur@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye	[production]
10:56	<moritzm>	restarting apache/FPM on mw canaries to pick up gnutls update	[production]
10:52	<moritzm>	installing gnutls28 security updates	[production]
10:47	<fabfur@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage	[production]
10:44	<fabfur@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage	[production]
10:39	<moritzm>	installing jetty9 security updates	[production]
10:29	<volans@cumin1002>	END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox	[production]
10:29	<volans@cumin1002>	START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox	[production]
10:17	<fabfur@cumin1002>	START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye	[production]
10:12	<XioNoX>	remove VRRP pinning on cr1-eqiad/cr2-eqiad/cr2-codfw	[production]
10:09	<moritzm>	installing Linux 6.1.67 updates on Bookworm hosts	[production]
09:45	<XioNoX>	make eqiad-codfw 100G link primary	[production]
09:10	<vgutierrez>	vgutierrez@acmechief1002:~$ sudo -i keyholder arm - T352242	[production]
2023-12-17 §
12:59	<elukey>	restart kubelet on ml-serve1001 (errors while syncing old containers)	[production]
2023-12-16 §
01:21	<eevans@deploy2002>	Finished deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided) (duration: 00m 10s)	[production]
01:21	<eevans@deploy2002>	Started deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided)	[production]
00:44	<htriedman@deploy2002>	Finished deploy [airflow-dags/platform_eng@63804c4]: (no justification provided) (duration: 00m 25s)	[production]
00:44	<htriedman@deploy2002>	Started deploy [airflow-dags/platform_eng@63804c4]: (no justification provided)	[production]
00:05	<jhathaway>	unbreaking my puppet change with, https://gerrit.wikimedia.org/r/c/operations/puppet/+/983504	[production]
2023-12-15 §
23:46	<htriedman@deploy2002>	Finished deploy [airflow-dags/platform_eng@9600237]: (no justification provided) (duration: 00m 27s)	[production]
23:46	<htriedman@deploy2002>	Started deploy [airflow-dags/platform_eng@9600237]: (no justification provided)	[production]
23:06	<milimetric@deploy2002>	Finished deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided) (duration: 00m 25s)	[production]
23:06	<milimetric@deploy2002>	Started deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided)	[production]
22:42	<pfischer@deploy2002>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
22:42	<pfischer@deploy2002>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
22:03	<htriedman@deploy2002>	Finished deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided) (duration: 00m 25s)	[production]
22:03	<htriedman@deploy2002>	Started deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided)	[production]
21:48	<milimetric@deploy2002>	Finished deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS (duration: 00m 06s)	[production]
21:48	<milimetric@deploy2002>	Started deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS	[production]
21:48	<milimetric@deploy2002>	Finished deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS (duration: 81m 46s)	[production]
21:26	<mutante>	running puppet on all prometheus*	[production]
20:26	<milimetric@deploy2002>	Started deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS	[production]
15:44	<isaranto@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
15:25	<klausman@deploy2002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
15:01	<klausman@deploy2002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.	[production]
15:00	<klausman@deploy2002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.	[production]
14:46	<brouberol@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply	[production]
14:46	<arnaudb@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54482 and previous config saved to /var/cache/conftool/dbconfig/20231215-144624-arnaudb.json	[production]
14:46	<brouberol@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply	[production]
14:45	<brouberol@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply	[production]
14:44	<brouberol@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply	[production]
14:40	<dcausse@deploy2002>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
14:39	<dcausse@deploy2002>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
14:38	<arnaudb@cumin1001>	dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54481 and previous config saved to /var/cache/conftool/dbconfig/20231215-143812-arnaudb.json	[production]
14:31	<arnaudb@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 80%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54480 and previous config saved to /var/cache/conftool/dbconfig/20231215-143118-arnaudb.json	[production]
14:27	<klausman@deploy2002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.	[production]
14:27	<arnaudb@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished	[production]