production SAL

401-450 of 10000 results (140ms)

2024-08-12 §
15:06	<isaranto@deploy1003>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .	[production]
14:46	<jgiannelos@deploy1003>	helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply	[production]
14:45	<jgiannelos@deploy1003>	helmfile [eqiad] START helmfile.d/services/mobileapps: apply	[production]
14:44	<bking@cumin2002>	START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: security update - bking@cumin2002 - T371874	[production]
14:42	<elukey>	powercycle ms-be1078 - causing frontend errors in swift-eqiad, network link is down (if down/up didn't work, nothing in the dmesg/syslog)	[production]
14:42	<jgiannelos@deploy1003>	helmfile [codfw] DONE helmfile.d/services/mobileapps: apply	[production]
14:41	<jgiannelos@deploy1003>	helmfile [codfw] START helmfile.d/services/mobileapps: apply	[production]
14:38	<jgiannelos@deploy1003>	helmfile [eqiad] START helmfile.d/services/mobileapps: apply	[production]
14:38	<jgiannelos@deploy1003>	helmfile [eqiad] START helmfile.d/services/mobileapps: apply	[production]
14:34	<jgiannelos@deploy1003>	helmfile [eqiad] START helmfile.d/services/mobileapps: apply	[production]
14:23	<zabe@deploy1003>	Finished scap: Backport for [[gerrit:1061152\|Further configuration for bdrwiki (T371760)]] (duration: 21m 07s)	[production]
14:01	<zabe@deploy1003>	Started scap sync-world: Backport for [[gerrit:1061152\|Further configuration for bdrwiki (T371760)]]	[production]
13:46	<hnowlan@deploy1003>	helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply	[production]
13:46	<hnowlan@deploy1003>	helmfile [eqiad] START helmfile.d/services/shellbox-video: apply	[production]
13:33	<klausman@deploy1003>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
13:33	<klausman@deploy1003>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
13:25	<jgiannelos@deploy1003>	helmfile [staging] DONE helmfile.d/services/mobileapps: apply	[production]
13:24	<jgiannelos@deploy1003>	helmfile [staging] START helmfile.d/services/mobileapps: apply	[production]
13:24	<jgiannelos@deploy1003>	helmfile [staging] START helmfile.d/services/mobileapps: apply	[production]
12:37	<elukey>	restart exim4 on list2001 to pick up the new TLS material	[production]
12:35	<elukey>	restart exim4 on list1004 to pick up the new TLS material	[production]
12:32	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance	[production]
12:32	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance	[production]
12:11	<elukey@cumin1002>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Openjdk upgrade - elukey@cumin1002	[production]
12:04	<kevinbazira@deploy1003>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
12:03	<kevinbazira@deploy1003>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
11:59	<kevinbazira@deploy1003>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
11:26	<hnowlan>	rebuilding php7.4-fpm and php7.4-fpm-multiversion-base to pick up healthz worker awareness change (r/1060867)	[production]
11:22	<ladsgroup@cumin1002>	conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1	[production]
11:10	<kevinbazira@deploy1003>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
11:06	<isaranto@deploy1003>	helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
11:04	<isaranto@deploy1003>	helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
11:03	<isaranto@deploy1003>	helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
10:19	<vgutierrez>	restarting apache on puppetmaster1003	[production]
09:54	<kamila_>	rebooting puppetmaster1001 due to intermittent network failures	[production]
09:46	<ayounsi@cumin1002>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994	[production]
09:43	<ayounsi@cumin1002>	START - Cookbook sre.network.peering with action 'email' for AS: 54994	[production]
09:17	<urbanecm@deploy1003>	Finished scap: Backport for [[gerrit:1061148\|MenteeOverviewApi: Do not apply undefined/null params (T372164)]] (duration: 19m 54s)	[production]
09:11	<urbanecm@deploy1003>	urbanecm: Continuing with sync	[production]
09:11	<godog>	bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1061955	[production]
09:10	<urbanecm@deploy1003>	urbanecm: Backport for [[gerrit:1061148\|MenteeOverviewApi: Do not apply undefined/null params (T372164)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
08:57	<urbanecm@deploy1003>	Started scap sync-world: Backport for [[gerrit:1061148\|MenteeOverviewApi: Do not apply undefined/null params (T372164)]]	[production]
07:39	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption	[production]
07:39	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption	[production]
07:38	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 - s2', diff saved to https://phabricator.wikimedia.org/P67270 and previous config saved to /var/cache/conftool/dbconfig/20240812-073846-arnaudb.json	[production]
2024-08-11 §
07:58	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67269 and previous config saved to /var/cache/conftool/dbconfig/20240811-075839-marostegui.json	[production]
07:43	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67268 and previous config saved to /var/cache/conftool/dbconfig/20240811-074332-marostegui.json	[production]
07:28	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67267 and previous config saved to /var/cache/conftool/dbconfig/20240811-072825-marostegui.json	[production]