production SAL

1201-1250 of 10000 results (56ms)

2022-07-08 §
19:57	<cdanis@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix	[production]
19:57	<cdanis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix	[production]
19:56	<cdanis@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix	[production]
19:56	<cdanis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix	[production]
19:56	<cdanis@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix	[production]
19:49	<tzatziki>	removing 2 files for legal compliance	[production]
18:42	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1001.wikimedia.org with OS bullseye	[production]
18:26	<urandom>	changing Cassandra superuser password, AQS cluster -- T311652	[production]
18:21	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage	[production]
18:18	<bking@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage	[production]
18:03	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1001.wikimedia.org with OS bullseye	[production]
16:25	<bking@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
15:29	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
15:27	<bking@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
15:27	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
15:15	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
15:00	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
14:59	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
14:49	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye	[production]
14:46	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1004.wikimedia.org with OS bullseye	[production]
14:34	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30990 and previous config saved to /var/cache/conftool/dbconfig/20220708-143411-root.json	[production]
14:26	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage	[production]
14:22	<bking@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage	[production]
14:19	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30983 and previous config saved to /var/cache/conftool/dbconfig/20220708-141907-root.json	[production]
14:10	<hashar@deploy1002>	Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: AddImage: Only process metadata for a single valid suggestion - T312544 (duration: 03m 25s)	[production]
14:09	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye	[production]
14:08	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
14:07	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
14:07	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
14:06	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
14:04	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30978 and previous config saved to /var/cache/conftool/dbconfig/20220708-140404-root.json	[production]
13:49	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30975 and previous config saved to /var/cache/conftool/dbconfig/20220708-134900-root.json	[production]
13:33	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30974 and previous config saved to /var/cache/conftool/dbconfig/20220708-133356-root.json	[production]
13:18	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30973 and previous config saved to /var/cache/conftool/dbconfig/20220708-131852-root.json	[production]
13:03	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30971 and previous config saved to /var/cache/conftool/dbconfig/20220708-130348-root.json	[production]
12:48	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30970 and previous config saved to /var/cache/conftool/dbconfig/20220708-124844-root.json	[production]
10:20	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deneb.codfw.wmnet	[production]
10:20	<jmm@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
10:16	<jmm@cumin2002>	START - Cookbook sre.dns.netbox	[production]
10:12	<jmm@cumin2002>	START - Cookbook sre.hosts.decommission for hosts deneb.codfw.wmnet	[production]
09:40	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage	[production]
09:40	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage	[production]
09:25	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2016.codfw.wmnet to cluster codfw and group D	[production]
07:33	<akosiaris>	reboot rdb1009 for kernel upgrades	[production]
07:29	<vgutierrez>	restart pybal on lvs6002	[production]
07:22	<akosiaris>	reboot rdb1010 for kernel upgrades	[production]
06:52	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti2016.codfw.wmnet to cluster codfw and group D	[production]
06:49	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet	[production]
06:47	<TimStarling>	on mwmaint2002: using iptables to simulate cross-DC memcached traffic loss	[production]
06:39	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet	[production]