tools SAL

1-50 of 2383 results (15ms)

2020-01-23 §
23:38	<bd808>	Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes	[tools]
23:16	<bd808>	Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster	[tools]
05:15	<bd808>	Building tools-elastic-04	[tools]
04:39	<bd808>	wmcs-openstack quota set --instances 192	[tools]
04:36	<bd808>	wmcs-openstack quota set --cores 768 --ram 1536000	[tools]
2020-01-22 §
12:43	<arturo>	for the record, issue with tools-worker-1016 was memory exhaustion apparently	[tools]
12:35	<arturo>	hard-reboot tools-worker-1016 (not responding to even console access)	[tools]
2020-01-21 §
19:25	<bstorm_>	hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network	[tools]
19:17	<bstorm_>	depooled and rebooted tools-sgeexec-0914 because it was acting funny	[tools]
18:30	<bstorm_>	depooling and rebooting tools-sgeexec-[0911,0913,0919,0921,0924,0931,0933,0935,0939,0941].tools.eqiad.wmflabs	[tools]
17:21	<bstorm_>	rebooting toolschecker to recover stale nfs handle	[tools]
2020-01-16 §
23:54	<bstorm_>	rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly	[tools]
23:45	<bstorm_>	rebuilding docker containers to include new webservice version (0.58)	[tools]
23:41	<bstorm_>	deployed toollabs-webservice 0.58 to everything that isn't a container	[tools]
16:45	<bstorm_>	ran configurator to set the gridengine web queues to `rerun FALSE` T242397	[tools]
2020-01-14 §
15:29	<bstorm_>	failed the gridengine master back to the master server from the shadow	[tools]
02:23	<andrewbogott>	rebooting tools-paws-worker-1006 to resolve hangs associated with an old NFS failure	[tools]
2020-01-13 §
17:48	<bd808>	Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642)	[tools]
16:42	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-12. Rebooting now. T242559	[tools]
16:33	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-11. Rebooting now. T242559	[tools]
16:31	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-10. Rebooting now. T242559	[tools]
16:26	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-9. Rebooting now. T242559	[tools]
2020-01-12 §
22:31	<Krenair>	same on -13 and -14	[tools]
22:28	<Krenair>	same on -8	[tools]
22:18	<Krenair>	same on -7	[tools]
22:11	<Krenair>	Did usual new instance creation puppet dance on tools-k8s-worker-6, /data/project got created	[tools]
2020-01-11 §
01:33	<bstorm_>	updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos.	[tools]
2020-01-10 §
23:31	<bstorm_>	updated toollabs-webservice package to 0.56	[tools]
15:45	<bstorm_>	depooled tools-paws-worker-1013 to reboot because I think it is the last tools server with that mount issue (I hope)	[tools]
15:35	<bstorm_>	depooling and rebooting tools-worker-1016 because it still had the leftover mount problems	[tools]
15:30	<bstorm_>	git stash-ing local puppet changes in hopes that arturo has that material locally, and it doesn't break anything to do so	[tools]
2020-01-09 §
23:35	<bstorm_>	depooled tools-sgeexec-0939 because it isn't acting right and rebooting it	[tools]
18:26	<bstorm_>	re-joining the k8s nodes OF THE PAWS CLUSTER to the cluster one at a time to rotate the certs T242353	[tools]
18:25	<bstorm_>	re-joining the k8s nodes to the cluster one at a time to rotate the certs T242353	[tools]
18:06	<bstorm_>	rebooting tools-paws-master-01 T242353	[tools]
17:46	<bstorm_>	refreshing the paws cluster's entire x509 environment T242353	[tools]
2020-01-07 §
22:40	<bstorm_>	rebooted tools-worker-1007 to recover it from disk full and general badness	[tools]
16:33	<arturo>	deleted by hand pod metrics/cadvisor-5pd46 due to prometheus having issues scrapping it	[tools]
15:46	<bd808>	Rebooting tools-k8s-worker-[6-14]	[tools]
15:35	<bstorm_>	changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067	[tools]
14:02	<arturo>	`root@tools-k8s-control-3:~# wmcs-k8s-secret-for-cert -n metrics -s metrics-server-certs -a metrics-server` (T241853)	[tools]
13:33	<arturo>	upload docker-registry.tools.wmflabs.org/coreos/kube-state-metrics:v1.8.0 copied from quay.io/coreos/kube-state-metrics:v1.8.0 (T241853)	[tools]
13:31	<arturo>	upload docker-registry.tools.wmflabs.org/metrics-server-amd64:v0.3.6 copied from k8s.gcr.io/metrics-server-amd64:v0.3.6 (T241853)	[tools]
13:23	<arturo>	[new k8s] doing changes to kube-state-metrics and metrics-server trying to relocate them to the 'metrics' namespace (T241853)	[tools]
05:28	<bd808>	Creating tools-k8s-worker-[6-14] (again)	[tools]
05:20	<bd808>	Deleting busted tools-k8s-worker-[6-14]	[tools]
05:02	<bd808>	Creating tools-k8s-worker-[6-14]	[tools]
00:26	<bstorm_>	repooled tools-sgewebgrid-lighttpd-0919	[tools]
00:17	<bstorm_>	repooled tools-sgewebgrid-lighttpd-0918	[tools]
00:15	<bstorm_>	moving tools-sgewebgrid-lighttpd-0918 and -0919 to cloudvirt1004 from cloudvirt1029 to rebalance load	[tools]