production SAL

6301-6350 of 10000 results (45ms)

2021-04-08 §
18:37	<mutante>	mw2403 through mw2411 - serial rebooting	[production]
18:31	<tgr@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .	[production]
18:31	<tgr@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .	[production]
18:29	<urbanecm@deploy1002>	Synchronized php-1.36.0-wmf.38/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWBackTool.js: e0f3735f6a31d2914bae6c9daac1267707a2d108: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working (T279613) (duration: 01m 07s)	[production]
18:25	<tgr@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .	[production]
18:25	<tgr@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .	[production]
18:23	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install	[production]
18:23	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install	[production]
18:22	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: new_install	[production]
18:22	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: new_install	[production]
18:03	<mutante>	mw2403 through mw2411 - new hardware moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)	[production]
18:02	<mutante>	mw2403 through mw2401 - new hardwere moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)	[production]
17:59	<tgr@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .	[production]
17:52	<ryankemper@cumin2001>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)	[production]
17:29	<jgiannelos@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
17:23	<jgiannelos@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
17:18	<jgiannelos@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
17:16	<dancy>	Scap 3.17.0 deployed to beta cluster	[production]
16:51	<dancy>	testing Scap 3.17.0 release on deployment-deploy01	[production]
16:33	<elukey>	reboot an-worker1100 again to check if all the disks come up correctly	[production]
16:16	<cmjohnson1>	update bios cp1087, already deposed for h/w issues T278729	[production]
16:15	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE	[production]
16:13	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE	[production]
16:10	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:05	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
15:51	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:44	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
15:36	<elukey>	reboot an-worker1100 to see if it helps with the strange BBU behavior	[production]
13:55	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon2001-dev.codfw.wmnet	[production]
13:44	<andrew@cumin1001>	START - Cookbook sre.hosts.decommission for hosts cloudcephmon2001-dev.codfw.wmnet	[production]
13:41	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE	[production]
13:39	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE	[production]
13:24	<moritzm>	installing groff bugfix updates from Buster point release	[production]
12:49	<ema>	cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809	[production]
12:44	<moritzm>	installing libbsd security updates for Buster	[production]
12:39	<moritzm>	installing xcftools security updates	[production]
12:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15264 and previous config saved to /var/cache/conftool/dbconfig/20210408-123137-root.json	[production]
12:16	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15263 and previous config saved to /var/cache/conftool/dbconfig/20210408-121633-root.json	[production]
12:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15262 and previous config saved to /var/cache/conftool/dbconfig/20210408-120128-root.json	[production]
11:58	<XioNoX>	tighten all routers loopback firewall filter - T207799	[production]
11:57	<zpapierski@deploy1002>	Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 00m 09s)	[production]
11:57	<zpapierski@deploy1002>	Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix	[production]
11:50	<XioNoX>	tighten cr3-ulsfo loopback firewall filter - T207799	[production]
11:49	<zpapierski@deploy1002>	Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 01m 39s)	[production]
11:47	<zpapierski@deploy1002>	Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix	[production]
11:46	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15261 and previous config saved to /var/cache/conftool/dbconfig/20210408-114625-root.json	[production]
11:23	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15259 and previous config saved to /var/cache/conftool/dbconfig/20210408-112332-root.json	[production]
11:09	<filippo@cumin1001>	END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2028.codfw.wmnet	[production]
11:08	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15258 and previous config saved to /var/cache/conftool/dbconfig/20210408-110828-root.json	[production]
11:07	<urbanecm@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: de1670cbd2c59a24f1e29a6d3731e3ac7f39d336: Enable Growth for newcomers on simplewiki, mswiki, tawiki (T278369; T277562; T277550) (duration: 01m 07s)	[production]