production SAL

651-700 of 10000 results (36ms)

2021-09-04 §
12:50	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json	[production]
12:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json	[production]
12:20	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json	[production]
09:03	<elukey>	restart wmf_auto_restart_rsyslog.service on puppetdb1002	[production]
09:00	<elukey>	`systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026	[production]
03:02	<rzl@cumin2001>	dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json	[production]
2021-09-03 §
21:49	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
20:30	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
19:33	<krinkle@deploy1002>	Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s)	[production]
19:33	<krinkle@deploy1002>	Started deploy [integration/docroot@6492b3d]: I48480e89e5f6	[production]
19:26	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
19:04	<ryankemper>	T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)	[production]
17:42	<dduvall@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
17:40	<dduvall@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
17:35	<dduvall@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .	[production]
17:17	<ryankemper>	T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw	[production]
16:32	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
16:10	<gehel>	blazegraph (public cofdfw cluster) will now restart every hour - T290330	[production]
15:53	<jbond>	enable puppet fleet wide to post puppetdb database maintance - T263578	[production]
15:21	<jbond>	create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - T263578	[production]
15:17	<jbond>	create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - T263578	[production]
15:00	<jbond>	disable puppet fleet wide to preform puppetdb database maintance - T263578	[production]
14:58	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
14:58	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
14:35	<pt1979@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
14:29	<pt1979@cumin2002>	START - Cookbook sre.dns.netbox	[production]
14:20	<mutante>	mw2264 - scap pull	[production]
14:18	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
14:18	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
13:11	<jiji@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet	[production]
13:10	<dcausse>	installing openjdk-8-dbg on wdqs2007	[production]
13:04	<jiji@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet	[production]
13:02	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet	[production]
12:48	<jiji@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet	[production]
12:46	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet	[production]
12:32	<jiji@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet	[production]
12:12	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet	[production]
12:03	<joal@deploy1002>	Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)	[production]
12:03	<joal@deploy1002>	Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]	[production]
12:03	<joal@deploy1002>	Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)	[production]
11:56	<dcausse@deploy1002>	Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)	[production]
11:44	<joal@deploy1002>	Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]	[production]
11:42	<marostegui>	Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - T289050	[production]
11:37	<dcausse@deploy1002>	Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA	[production]
11:36	<dcausse@deploy1002>	Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)	[production]
11:35	<dcausse@deploy1002>	Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA	[production]
10:58	<jiji@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet	[production]
10:54	<jiji@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet	[production]
10:47	<joal@deploy1002>	Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)	[production]
10:46	<joal@deploy1002>	Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures	[production]