2020-08-25 §
19:38 <andrewbogott> deleting tools-sgeexec-0943.tools.eqiad.wmflabs, tools-sgeexec-0944.tools.eqiad.wmflabs, tools-sgeexec-0945.tools.eqiad.wmflabs, tools-sgeexec-0946.tools.eqiad.wmflabs, tools-sgeexec-0948.tools.eqiad.wmflabs, tools-sgeexec-0949.tools.eqiad.wmflabs, tools-sgeexec-0953.tools.eqiad.wmflabs — they are broken and we're not very curious why; will retry this exercise when everything is standardized on [tools]
15:03 <andrewbogott> removing non-ceph nodes tools-sgeexec-0921 through tools-sgeexec-0931 [tools]
15:02 <andrewbogott> added new sge-exec nodes tools-sgeexec-0943 through tools-sgeexec-0953 (for real this time) [tools]
2020-08-19 §
21:29 <andrewbogott> shutting down and removing tools-k8s-worker-20 through tools-k8s-worker-29; this load can now be handled by new nodes on ceph hosts [tools]
21:15 <andrewbogott> shutting down and removing tools-k8s-worker-1 through tools-k8s-worker-19; this load can now be handled by new nodes on ceph hosts [tools]
18:40 <andrewbogott> creating 13 new xlarge k8s worker nodes, tools-k8s-worker-67 through tools-k8s-worker-79 [tools]
2020-08-18 §
15:24 <bd808> Rebuilding all Docker containers to pick up newest versions of installed packages [tools]
2020-07-30 §
16:28 <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663 [tools]
2020-07-29 §
23:24 <bd808> Pushed a copy of docker-registry.wikimedia.org/wikimedia-jessie:latest to docker-registry.tools.wmflabs.org/wikimedia-jessie:latest in preparation for the upstream image going away [tools]
2020-07-24 §
22:33 <bd808> Removed a few more ancient docker images: grrrit, jessie-toollabs, and nagf [tools]
21:02 <bd808> Running cleanup script to delete the non-sssd toolforge images from docker-registry.tools.wmflabs.org [tools]
20:17 <bd808> Forced garbage collection on docker-registry.tools.wmflabs.org [tools]
20:06 <bd808> Running cleanup script to delete all of the old toollabs-* images from docker-registry.tools.wmflabs.org [tools]
2020-07-22 §
23:24 <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663 [tools]
23:22 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[56-60] T257945 [tools]
23:17 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[41-55] T257945 [tools]
23:14 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[21-40] T257945 [tools]
23:11 <bstorm> running puppet and NFS remount on tools-k8s-worker-[1-15] T257945 [tools]
23:07 <bstorm> disabling puppet on k8s workers to reduce the effect of changing the NFS mount version all at once T257945 [tools]
22:28 <bstorm> setting tools-k8s-control prefix to mount NFS v4.2 T257945 [tools]
22:15 <bstorm> set the tools-k8s-control nodes to also use 800MBps to prevent issues with toolforge ingress and api system [tools]
22:07 <bstorm> set the tools-k8s-haproxy-1 (main load balancer for toolforge) to have an egress limit of 800MB per sec instead of the same as all the other servers [tools]
2020-07-21 §
16:09 <bstorm> rebooting tools-sgegrid-shadow to remount NFS correctly [tools]
15:55 <bstorm> set the bastion prefix to have explicitly set hiera value of profile::wmcs::nfsclient::nfs_version: '4' [tools]
2020-07-17 §
16:47 <bd808> Enabled Puppet on tools-proxy-06 following successful test (T102367) [tools]
16:29 <bd808> Disabled Puppet on tools-proxy-06 to test nginx config changes manually (T102367) [tools]
2020-07-15 §
23:11 <bd808> Removed ssh root key for valhallasw from project hiera (T255697) [tools]
2020-07-09 §
18:53 <bd808> Updating git-review to 1.27 via clush across cluster (T257496) [tools]
2020-07-08 §
11:16 <arturo> merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/610029 -- important change to front-proxy (T234617) [tools]
11:11 <arturo> live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/610029 (T234617) [tools]
2020-07-07 §
23:22 <bd808> Rebuilding all Docker images to pick up webservice v0.73 (T234617, T257229) [tools]
23:19 <bd808> Deploying webservice v0.73 via clush (T234617, T257229) [tools]
23:16 <bd808> Building webservice v0.73 (T234617, T257229) [tools]
15:01 <Reedy> killed python process from tools.experimental-embeddings using a lot of cpu on tools-sgebastion-07 [tools]
15:01 <Reedy> killed meno25 process running pwb.py on tools-sgebastion-07 [tools]
09:59 <arturo> point DNS tools.wmflabs.org A record to (tools-legacy-redirector) (T247236) [tools]
2020-07-06 §
11:54 <arturo> briefly point DNS tools.wmflabs.org A record to (tools-legacy-redirector) and then switch back to (tools-proxy-05). The legacy redirector does HTTP/307 (T247236) [tools]
11:50 <arturo> associate floating IP address to tools-legacy-redirector (T247236) [tools]
2020-07-01 §
11:19 <arturo> cleanup exim email queue (4 frozen messages) [tools]
11:01 <arturo> live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/608849 (T256737) [tools]
2020-06-30 §
11:18 <arturo> set some hiera keys for mtail in puppet prefix `tools-mail` (T256737) [tools]
2020-06-29 §
22:48 <legoktm> built html-sssd/web image (T241817) [tools]
22:23 <legoktm> rebuild python{34,35,37}-sssd/web images for https://gerrit.wikimedia.org/r/608093 [tools]
12:01 <arturo> introduced spam filter in the mail server (T120210) [tools]
2020-06-25 §
21:49 <zhuyifei1999_> re-enabling puppet on tools-sgebastion-09 T256426 [tools]
21:39 <zhuyifei1999_> disabling puppet on tools-sgebastion-09 so I can play with mount settings T256426 [tools]
21:24 <bstorm> hard rebooting tools-sgebastion-09 [tools]
2020-06-24 §
12:36 <arturo> live-hacking puppetmaster with exim prometheus stuff (T175964) [tools]
11:57 <arturo> merging email ratelimiting patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/607320 (T175964) [tools]
2020-06-23 §
17:55 <arturo> killed procs for users `hamishz` and `msyn` which apparently were tools that should be running in the grid / kubernetes instead [tools]