1551-1600 of 4609 results (33ms)
2021-05-10 §
22:58 <bstorm> cleared error state on a grid queue [tools]
22:58 <bstorm> setting `profile::wmcs::kubeadm::docker_vol: false` on ingress nodes [tools]
15:22 <Majavah> change k8s.svc.tools.eqiad1.wikimedia.cloud. to point to the tools-k8s-haproxy-keepalived-vip address 172.16.6.113 (T252239) [tools]
15:06 <Majavah> carefully rolling out keepalived to tools-k8s-haproxy-[3-4] while making sure [1-2] do not have changes [tools]
15:03 <Majavah> clear all error states caused by overloaded exec nodes [tools]
14:57 <arturo> allow tools-k8s-haproxy-[3-4] to use the tools-k8s-haproxy-keepalived-vip address (172.16.6.113) (T252239) [tools]
12:53 <Majavah> creating tools-k8s-haproxy-[3-4] to rebuild current ones without nfs and with keepalived [tools]
2021-05-09 §
06:55 <Majavah> clear error state from tools-sgeexec-0916 [tools]
2021-05-08 §
10:57 <Majavah> import docker image k8s.gcr.io/ingress-nginx/controller:v0.46.0 to local registry as docker-registry.tools.wmflabs.org/nginx-ingress-controller:v0.46.0 T264221 [tools]
2021-05-07 §
18:07 <Majavah> generate and add k8s haproxy keepalived password (profile::toolforge::k8s::haproxy::keepalived_password) to private puppet repo [tools]
17:15 <bstorm> recreated recordset of k8s.tools.eqiad1.wikimedia.cloud as CNAME to k8s.svc.tools.eqiad1.wikimedia.cloud T282227 [tools]
17:12 <bstorm> created A record of k8s.svc.tools.eqiad1.wikimedia.cloud pointing at current cluster with TTL of 300 for quick initial failover when the new set of haproxy nodes are ready T282227 [tools]
09:44 <arturo> `sudo wmcs-openstack --os-project-id=tools port create --network lan-flat-cloudinstances2b tools-k8s-haproxy-keepalived-vip` [tools]
2021-05-06 §
14:43 <Majavah> clear error states from all currently erroring exec nodes [tools]
14:37 <Majavah> clear error state from tools-sgeexec-0913 [tools]
04:34 <Majavah> add own root key to project hiera on horizon T278390 [tools]
02:36 <andrewbogott> removing jhedden from sudo roots [tools]
2021-05-05 §
19:27 <andrewbogott> adding taavi as a sudo root to project toolforge for T278390 [tools]
2021-05-04 §
15:23 <arturo> upgrading exim4-daemon-heavy in tools-mail-03 [tools]
10:47 <arturo> rebase & resolve merge conflicts in labs/private.git [tools]
2021-05-03 §
16:23 <dcaro> started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641) [tools]
14:07 <dcaro> depooling tols-sgeexec-0908/7 to be able to restart the VMs as they got stuck during migration (T280641) [tools]
2021-04-29 §
18:23 <bstorm> removing one more etcd node via cookbook T279723 [tools]
18:12 <bstorm> removing an etcd node via cookbook T279723 [tools]
2021-04-27 §
16:40 <bstorm> deleted all the errored out grid jobs stuck in queue wait [tools]
16:16 <bstorm> cleared E status on grid queues to get things flowing again [tools]
2021-04-26 §
12:17 <arturo> allowing more tools into the legacy redirector (T281003) [tools]
2021-04-22 §
08:44 <Krenair> Removed yuvipanda from roots sudo policy [tools]
08:42 <Krenair> Removed yuvipanda from projectadmin per request [tools]
08:40 <Krenair> Removed yuvipanda from tools.admin per request [tools]
2021-04-20 §
22:20 <bd808> `clush -w @all -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"` [tools]
22:19 <bd808> `clush -w @exec -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"` [tools]
21:52 <bd808> Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad1.wikimedia.cloud`. Was using wrong domain name in prior update. [tools]
21:49 <bstorm> tagged the latest maintain-kubeusers and deployed to toolforge (with kustomize changes to rbac) after testing in toolsbeta T280300 [tools]
21:27 <bd808> Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad.wmflabs`. was -2 which is decommed. [tools]
10:18 <dcaro> seting the retention on the tools-prometheus VMs to 250GB (they have 276GB total, leaving some space for online data operations if needed) (T279990) [tools]
2021-04-19 §
10:53 <dcaro> reverting setting prometheus data source in grafana to 'server', can't connect, [tools]
10:51 <dcaro> setting prometheus data source in grafana to 'server' to avoid CORS issues [tools]
2021-04-16 §
23:15 <bstorm> cleaned up all source files for the grid with the old domain name to enable future node creation T277653 [tools]
14:38 <dcaro> added 'will get out of space in X days' panel to the dasboard https://grafana-labs.wikimedia.org/goto/kBlGd0uGk (T279990), we got <5days xd [tools]
11:35 <arturo> running `grid-configurator --all-domains` which basically added tools-sgebastion-10,11 as submit hosts and removed tools-sgegrid-master,shadow as submit hosts [tools]
2021-04-15 §
17:45 <bstorm> cleared error state from tools-sgeexec-0920.tools.eqiad.wmflabs for a failed job [tools]
2021-04-13 §
13:26 <dcaro> upgrade puppet and python-wmflib on tools-prometheus-03 [tools]
11:23 <arturo> deleted shutoff VM tools-package-builder-02 (T275864) [tools]
11:21 <arturo> deleted shutoff VM tools-sge-services-03,04 (T278354) [tools]
11:20 <arturo> deleted shutoff VM tools-docker-registry-03,04 (T278303) [tools]
11:18 <arturo> deleted shutoff VM tools-mail-02 (T278538) [tools]
11:17 <arturo> deleted shutoff VMs tools-static-12,13 (T278539) [tools]
2021-04-11 §
16:07 <bstorm> cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936 [tools]
2021-04-08 §
18:25 <bstorm> cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for tools-sgegrid-master and tools-sgegrid-shadow using the old fqdns T277653 [tools]