tools SAL

101-150 of 3160 results (19ms)

2021-05-11 §
16:29	<Majavah>	carefully shutdown tools-k8s-haproxy-2 T252239	[tools]
2021-05-10 §
22:58	<bstorm>	cleared error state on a grid queue	[tools]
22:58	<bstorm>	setting `profile::wmcs::kubeadm::docker_vol: false` on ingress nodes	[tools]
15:22	<Majavah>	change k8s.svc.tools.eqiad1.wikimedia.cloud. to point to the tools-k8s-haproxy-keepalived-vip address 172.16.6.113 (T252239)	[tools]
15:06	<Majavah>	carefully rolling out keepalived to tools-k8s-haproxy-[3-4] while making sure [1-2] do not have changes	[tools]
15:03	<Majavah>	clear all error states caused by overloaded exec nodes	[tools]
14:57	<arturo>	allow tools-k8s-haproxy-[3-4] to use the tools-k8s-haproxy-keepalived-vip address (172.16.6.113) (T252239)	[tools]
12:53	<Majavah>	creating tools-k8s-haproxy-[3-4] to rebuild current ones without nfs and with keepalived	[tools]
2021-05-09 §
06:55	<Majavah>	clear error state from tools-sgeexec-0916	[tools]
2021-05-08 §
10:57	<Majavah>	import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221	[tools]
2021-05-07 §
18:07	<Majavah>	generate and add k8s haproxy keepalived password (profile::toolforge::k8s::haproxy::keepalived_password) to private puppet repo	[tools]
17:15	<bstorm>	recreated recordset of k8s.tools.eqiad1.wikimedia.cloud as CNAME to k8s.svc.tools.eqiad1.wikimedia.cloud T282227	[tools]
17:12	<bstorm>	created A record of k8s.svc.tools.eqiad1.wikimedia.cloud pointing at current cluster with TTL of 300 for quick initial failover when the new set of haproxy nodes are ready T282227	[tools]
09:44	<arturo>	`sudo wmcs-openstack --os-project-id=tools port create --network lan-flat-cloudinstances2b tools-k8s-haproxy-keepalived-vip`	[tools]
2021-05-06 §
14:43	<Majavah>	clear error states from all currently erroring exec nodes	[tools]
14:37	<Majavah>	clear error state from tools-sgeexec-0913	[tools]
04:34	<Majavah>	add own root key to project hiera on horizon T278390	[tools]
02:36	<andrewbogott>	removing jhedden from sudo roots	[tools]
2021-05-05 §
19:27	<andrewbogott>	adding taavi as a sudo root to project toolforge for T278390	[tools]
2021-05-04 §
15:23	<arturo>	upgrading exim4-daemon-heavy in tools-mail-03	[tools]
10:47	<arturo>	rebase & resolve merge conflicts in labs/private.git	[tools]
2021-05-03 §
16:23	<dcaro>	started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641)	[tools]
14:07	<dcaro>	depooling tols-sgeexec-0908/7 to be able to restart the VMs as they got stuck during migration (T280641)	[tools]
2021-04-29 §
18:23	<bstorm>	removing one more etcd node via cookbook T279723	[tools]
18:12	<bstorm>	removing an etcd node via cookbook T279723	[tools]
2021-04-27 §
16:40	<bstorm>	deleted all the errored out grid jobs stuck in queue wait	[tools]
16:16	<bstorm>	cleared E status on grid queues to get things flowing again	[tools]
2021-04-26 §
12:17	<arturo>	allowing more tools into the legacy redirector (T281003)	[tools]
2021-04-22 §
08:44	<Krenair>	Removed yuvipanda from roots sudo policy	[tools]
08:42	<Krenair>	Removed yuvipanda from projectadmin per request	[tools]
08:40	<Krenair>	Removed yuvipanda from tools.admin per request	[tools]
2021-04-20 §
22:20	<bd808>	`clush -w @all -b "sudo exiqgrep -z -i \| xargs sudo exim -Mt"`	[tools]
22:19	<bd808>	`clush -w @exec -b "sudo exiqgrep -z -i \| xargs sudo exim -Mt"`	[tools]
21:52	<bd808>	Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad1.wikimedia.cloud`. Was using wrong domain name in prior update.	[tools]
21:49	<bstorm>	tagged the latest maintain-kubeusers and deployed to toolforge (with kustomize changes to rbac) after testing in toolsbeta T280300	[tools]
21:27	<bd808>	Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad.wmflabs`. was -2 which is decommed.	[tools]
10:18	<dcaro>	seting the retention on the tools-prometheus VMs to 250GB (they have 276GB total, leaving some space for online data operations if needed) (T279990)	[tools]
2021-04-19 §
10:53	<dcaro>	reverting setting prometheus data source in grafana to 'server', can't connect,	[tools]
10:51	<dcaro>	setting prometheus data source in grafana to 'server' to avoid CORS issues	[tools]
2021-04-16 §
23:15	<bstorm>	cleaned up all source files for the grid with the old domain name to enable future node creation T277653	[tools]
14:38	<dcaro>	added 'will get out of space in X days' panel to the dasboard https://grafana-labs.wikimedia.org/goto/kBlGd0uGk (T279990), we got <5days xd	[tools]
11:35	<arturo>	running `grid-configurator --all-domains` which basically added tools-sgebastion-10,11 as submit hosts and removed tools-sgegrid-master,shadow as submit hosts	[tools]
2021-04-15 §
17:45	<bstorm>	cleared error state from tools-sgeexec-0920.tools.eqiad.wmflabs for a failed job	[tools]
2021-04-13 §
13:26	<dcaro>	upgrade puppet and python-wmflib on tools-prometheus-03	[tools]
11:23	<arturo>	deleted shutoff VM tools-package-builder-02 (T275864)	[tools]
11:21	<arturo>	deleted shutoff VM tools-sge-services-03,04 (T278354)	[tools]
11:20	<arturo>	deleted shutoff VM tools-docker-registry-03,04 (T278303)	[tools]
11:18	<arturo>	deleted shutoff VM tools-mail-02 (T278538)	[tools]
11:17	<arturo>	deleted shutoff VMs tools-static-12,13 (T278539)	[tools]
2021-04-11 §
16:07	<bstorm>	cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936	[tools]