tools SAL

2051-2100 of 4456 results (26ms)

2020-02-03 §
14:12	<arturo>	move tools-prometheus-04 from cloudvirt1022 to cloudvirt1013	[tools]
12:48	<arturo>	shutdown tools-prometheus-01 and tools-prometheus-02, after fixing the proxy `tools-prometheus.wmflabs.org` to tools-prometheus-03, data synced (T238096)	[tools]
09:38	<arturo>	tools-prometheus-01: systemctl stop prometheus@tools. Another try to migrate data to tools-prometheus-{03,04} (T238096)	[tools]
2020-01-31 §
14:05	<arturo>	leave tools-prometheus-01 as the backend for tools-prometheus.wmflabs.org for the weekend so grafana dashboards keep working (T238096)	[tools]
14:00	<arturo>	syncing again prometheus data from tools-prometheus-01 to tools-prometheus-0{3,4} due to some inconsistencies preventing prometheus from starting (T238096)	[tools]
2020-01-30 §
21:04	<andrewbogott>	also apt-get install python3-novaclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/	[tools]
20:39	<andrewbogott>	apt-get install python3-keystoneclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/	[tools]
16:27	<arturo>	create VM tools-prometheus-04 as cold standby of tools-prometheus-03 (T238096)	[tools]
16:25	<arturo>	point tools-prometheus.wmflabs.org proxy to tools-prometheus-03 (T238096)	[tools]
13:42	<arturo>	disable puppet in prometheus servers while syncing metric data (T238096)	[tools]
13:14	<arturo>	drop floating IP 185.15.56.60 and FQDN `prometheus.tools.wmcloud.org` because this is not how the prometheus setup is right now. Use a web proxy instead `tools-prometheus-new.wmflabs.org` (T238096)	[tools]
13:09	<arturo>	created FQDN `prometheus.tools.wmcloud.org` pointing to IPv4 185.15.56.60 (tools-prometheus-03) to test T238096	[tools]
12:59	<arturo>	associated floating IPv4 185.15.56.60 to tools-prometheus-03 (T238096)	[tools]
12:57	<arturo>	created domain `tools.wmcloud.org` in the tools project after some back and forth with designated, permissions and the database. I plan to use this domain to test the new Debian Buster-based prometheus setup (T238096)	[tools]
10:20	<arturo>	create new VM instance tools-prometheus-03 (T238096)	[tools]
2020-01-29 §
20:07	<bd808>	Created {bastion,login,dev}.toolforge.org service names for Toolforge bastions using Horizon & Designate	[tools]
2020-01-28 §
13:35	<arturo>	`aborrero@tools-clushmaster-02:~$ clush -w @exec-stretch 'for i in $(ps aux \| grep [t]ools.j \| awk -F" " "{print \$2}") ; do echo "killing $i" ; sudo kill $i ; done \|\| true'` (T243831)	[tools]
2020-01-27 §
07:05	<zhuyifei1999_>	wrong package. uninstalled. the correct one is bpfcc-tools and seems only available in buster+. T115231	[tools]
07:01	<zhuyifei1999_>	apt installing bcc on tools-worker-1037 to see who is sending SIGTERM, will uninstall after done. dependency: bin86. T115231	[tools]
2020-01-24 §
20:58	<bd808>	Built tools-k8s-worker-21 to test out build script following openstack client upgrade	[tools]
15:45	<bd808>	Rebuilding all Docker containers again because I failed to actually update the build server git clone properly last time I did this	[tools]
05:23	<bd808>	Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster (take 2)	[tools]
04:41	<bd808>	Rebuilding all Docker images to pick up webservice-python-bootstrap changes	[tools]
2020-01-23 §
23:38	<bd808>	Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes	[tools]
23:16	<bd808>	Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster	[tools]
05:15	<bd808>	Building tools-elastic-04	[tools]
04:39	<bd808>	wmcs-openstack quota set --instances 192	[tools]
04:36	<bd808>	wmcs-openstack quota set --cores 768 --ram 1536000	[tools]
2020-01-22 §
12:43	<arturo>	for the record, issue with tools-worker-1016 was memory exhaustion apparently	[tools]
12:35	<arturo>	hard-reboot tools-worker-1016 (not responding to even console access)	[tools]
2020-01-21 §
19:25	<bstorm_>	hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network	[tools]
19:17	<bstorm_>	depooled and rebooted tools-sgeexec-0914 because it was acting funny	[tools]
18:30	<bstorm_>	depooling and rebooting tools-sgeexec-[0911,0913,0919,0921,0924,0931,0933,0935,0939,0941].tools.eqiad.wmflabs	[tools]
17:21	<bstorm_>	rebooting toolschecker to recover stale nfs handle	[tools]
2020-01-16 §
23:54	<bstorm_>	rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly	[tools]
23:45	<bstorm_>	rebuilding docker containers to include new webservice version (0.58)	[tools]
23:41	<bstorm_>	deployed toollabs-webservice 0.58 to everything that isn't a container	[tools]
16:45	<bstorm_>	ran configurator to set the gridengine web queues to `rerun FALSE` T242397	[tools]
2020-01-14 §
15:29	<bstorm_>	failed the gridengine master back to the master server from the shadow	[tools]
02:23	<andrewbogott>	rebooting tools-paws-worker-1006 to resolve hangs associated with an old NFS failure	[tools]
2020-01-13 §
17:48	<bd808>	Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642)	[tools]
16:42	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-12. Rebooting now. T242559	[tools]
16:33	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-11. Rebooting now. T242559	[tools]
16:31	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-10. Rebooting now. T242559	[tools]
16:26	<bd808>	Cordoned and fixed puppet on tools-k8s-worker-9. Rebooting now. T242559	[tools]
2020-01-12 §
22:31	<Krenair>	same on -13 and -14	[tools]
22:28	<Krenair>	same on -8	[tools]
22:18	<Krenair>	same on -7	[tools]
22:11	<Krenair>	Did usual new instance creation puppet dance on tools-k8s-worker-6, /data/project got created	[tools]
2020-01-11 §
01:33	<bstorm_>	updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos.	[tools]