2021-06-29 §
15:12 <arturo> livehacking puppetmaster for T283238 [tools]
10:24 <dcaro> running puppet on the buster bastions after 20000 minutes failing... might break something [tools]
2021-06-15 §
19:02 <bstorm> cleared error status from a few queues [tools]
16:15 <majavah> deleting unused shutdown nodes: tools-checker-03 tools-k8s-haproxy-1 tools-k8s-haproxy-2 [tools]
2021-06-14 §
22:21 <bstorm> push docker-registry.tools.wmflabs.org/toolforge-python37-sssd-web:testing to test staged os.execv (and other patches) using toolsbeta toollabs-webservice version 0.75 T282975 [tools]
2021-06-13 §
08:15 <majavah> clear grid error state from tools-sgeexec-0907, tools-sgeexec-0916, tools-sgeexec-0940 [tools]
2021-06-12 §
14:39 <majavah> remove nonexistent tools-prometheus-04 and add tools-prometheus-05 to hiera key "prometheus_nodes" [tools]
13:53 <majavah> create empty bullseye-{tools,toolsbeta} repositories on tools-services-05 aptly [tools]
2021-06-10 §
17:38 <majavah> clear error state from tools-sgeexec-0907, task@tools-sgeexec-0939 [tools]
2021-06-09 §
13:57 <majavah> clear error state from exec nodes tools-sgeexec-0913, tools-sgeexec-0936, task@tools-sgeexec-0940 [tools]
2021-06-07 §
18:39 <bstorm> cleaning up more error conditions on grid queues [tools]
17:42 <majavah> delete `ingress-nginx` namespace and related objects T264221 [tools]
17:37 <majavah> remove tools-k8s-ingress-[1-3] from kubernetes, follow-up to https://sal.toolforge.org/log/nd7v2HkB1jz_IcWuCX5M T264221 [tools]
2021-06-04 §
21:30 <bstorm> deleting "tools-k8s-ingress-3", "tools-k8s-ingress-2", "tools-k8s-ingress-1" T264221 [tools]
21:21 <bstorm> cleared error state from 4 grid queues [tools]
2021-06-03 §
18:26 <majavah> renew prometheus kubernetes certificate T280301 [tools]
17:06 <majavah> renew admission webhook certificates T280301 [tools]
2021-06-01 §
10:10 <majavah> properly clean up deleted vms tools-k8s-haproxy-[1,2], tools-checker-03 from puppet after using the wrong fqdn first time [tools]
09:54 <majavah> clear error state from tools-sgeexec-0913, tools-sgeexec-0950 [tools]
2021-05-30 §
18:58 <majavah> clear grid error state from 14 queues [tools]
2021-05-27 §
18:03 <bstorm> adjusted profile::wmcs::kubeadm::etcd_latency_ms from 30 back to the default (10) [tools]
16:04 <bstorm> cleared error state from several exec node queues [tools]
14:49 <andrewbogott> swapping in three new etcd nodes with local storage: tools-k8s-etcd-13,14,15 [tools]
2021-05-24 §
10:36 <arturo> rebased labs/private.git after merge conflict [tools]
06:49 <majavah> remove scfc kubernetes admin access after bd808 removed tools.admin membership to avoid maintain-kubeusers crashes when it expires [tools]
2021-05-22 §
14:47 <majavah> manually remove jeh admin certificates and from maintain-kubeusers configmap T282725 [tools]
14:32 <majavah> manually remove valhallasw yuvipanda admin certificates and from configmap and restart maintain-kubeusers pod T282725 [tools]
02:51 <bd808> Restarted nginx on tools-static-14 to see if that clears up the fontcdn 502 errors [tools]
2021-05-21 §
17:06 <majavah> unpool tooks-k8s-ingress-[4-6] [tools]
17:06 <majavah> repool tools-k8s-ingress-6 [tools]
17:02 <majavah> repool tools-k8s-ingress-4 and -5 [tools]
16:59 <bstorm> upgrading the ingress-gen2 controllers to release 3 to capture new RAM/CPU limits [tools]
16:43 <bstorm> resize tools-k8s-ingress-4 to g3.cores4.ram8.disk20 [tools]
16:43 <bstorm> resize tools-k8s-ingress-6 to g3.cores4.ram8.disk20 [tools]
16:40 <bstorm> resize tools-k8s-ingress-5 to g3.cores4.ram8.disk20 [tools]
16:04 <majavah> rollback kubernetes ingress update from front proxy [tools]
06:52 <Majavah> pool tools-k8s-ingress-6 and depool ingress-[2,3] T264221 [tools]
2021-05-20 §
17:05 <Majavah> pool tools-k8s-ingress-5 as an ingress node, depool ingress-1 T264221 [tools]
16:31 <Majavah> pool tools-k8s-worker-4 as an ingress node T264221 [tools]
15:17 <Majavah> trying to install ingress-nginx via helm again after adjusting security groups T264221 [tools]
15:15 <Majavah> move tools-k8s-ingress-[5-6] from "tools-k8s-full-connectivity" to "tools-new-k8s-full-connectivity" security group T264221 [tools]
2021-05-19 §
12:15 <Majavah> rollback ingress-nginx-gen2 [tools]
11:09 <Majavah> deploy helm-based nginx ingress controller v0.46.0 to ingress-nginx-gen2 namespace T264221 [tools]
10:44 <Majavah> create tools-k8s-ingress-[4-6] T264221 [tools]
2021-05-16 §
16:52 <Majavah> clear error state from tools-sgeexec-0905 tools-sgeexec-0907 tools-sgeexec-0936 tools-sgeexec-0941 [tools]
2021-05-14 §
19:18 <bstorm> adjusting the rate limits for bastions nfs_write upward a lot to make NFS writes faster now that the cluster is finally using 10Gb on the backend and frontend T218338 [tools]
16:55 <andrewbogott> rebooting toolserver-proxy-01 to clear up stray files [tools]
16:47 <andrewbogott> deleting log files older than 14 days on toolserver-proxy-01 [tools]
2021-05-12 §
19:45 <bstorm> cleared error state from some queues [tools]
19:05 <Majavah> remove phamhi-binding phamhi-view-binding cluster role bindings T282725 [tools]