1701-1750 of 4437 results (38ms)
2020-09-10 §
15:25 <arturo> detected missing DNS record for k8s.tools.eqiad1.wikimedia.cloud which means the k8s cluster is down [tools]
10:22 <arturo> enabling ingress dedicated worker nodes in the k8s cluster (T250172) [tools]
2020-09-09 §
11:12 <arturo> new ingress nodes added to the cluster, and tainted/labeled per the docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying#ingress_nodes (T250172) [tools]
10:50 <arturo> created puppet prefix `tools-k8s-ingress` (T250172) [tools]
10:42 <arturo> created VMs tools-k8s-ingress-1 and tools-k8s-ingress-2 in the `tools-ingress` server group T250172) [tools]
10:38 <arturo> created server group `tools-ingress` with soft anti affinity policy (T250172) [tools]
2020-09-08 §
23:24 <bstorm> clearing grid queue error states blocking job runs [tools]
22:53 <bd808> forcing puppet run on tools-sgebastion-07 [tools]
2020-09-02 §
18:13 <andrewbogott> moving tools-sgeexec-0920 to ceph [tools]
17:57 <andrewbogott> moving tools-sgeexec-0942 to ceph [tools]
2020-08-31 §
19:58 <andrewbogott> migrating tools-sgeexec-091[0-9] to ceph [tools]
17:19 <andrewbogott> migrating tools-sgeexec-090[4-9] to ceph [tools]
17:19 <andrewbogott> repooled tools-sgeexec-0901 [tools]
16:52 <bstorm> `apt install uwsgi` was run on tools-checker-03 in the last log T261677 [tools]
16:51 <bstorm> running `apt install uwsgi` with --allow-downgrades to fix the puppet setup there T261677 [tools]
14:26 <andrewbogott> depooling tools-sgeexec-0901, migrating to ceph [tools]
2020-08-30 §
00:57 <Krenair> also ran qconf -ds on each [tools]
00:34 <Krenair> Tidied up SGE problems (it was spamming root@ every minute for hours) following host deletions some hours ago - removed tools-sgeexec-0921 through 0931 from @general, ran qmod -rj on all jobs registered for those nodes, then qdel -f on the remainders, then qconf -de on each deleted node [tools]
2020-08-29 §
16:02 <bstorm> deleting "tools-sgeexec-0931", "tools-sgeexec-0930", "tools-sgeexec-0929", "tools-sgeexec-0928", "tools-sgeexec-0927" [tools]
16:00 <bstorm> deleting "tools-sgeexec-0926", "tools-sgeexec-0925", "tools-sgeexec-0924", "tools-sgeexec-0923", "tools-sgeexec-0922", "tools-sgeexec-0921" [tools]
2020-08-26 §
21:08 <bd808> Disabled puppet on tools-proxy-06 to test fixes for a bug in the new T251628 code [tools]
08:54 <arturo> merged several patches by bryan for toolforge front proxy (cleanups, etc) example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/622435 [tools]
2020-08-25 §
19:38 <andrewbogott> deleting tools-sgeexec-0943.tools.eqiad.wmflabs, tools-sgeexec-0944.tools.eqiad.wmflabs, tools-sgeexec-0945.tools.eqiad.wmflabs, tools-sgeexec-0946.tools.eqiad.wmflabs, tools-sgeexec-0948.tools.eqiad.wmflabs, tools-sgeexec-0949.tools.eqiad.wmflabs, tools-sgeexec-0953.tools.eqiad.wmflabs — they are broken and we're not very curious why; will retry this exercise when everything is standardized on [tools]
15:03 <andrewbogott> removing non-ceph nodes tools-sgeexec-0921 through tools-sgeexec-0931 [tools]
15:02 <andrewbogott> added new sge-exec nodes tools-sgeexec-0943 through tools-sgeexec-0953 (for real this time) [tools]
2020-08-19 §
21:29 <andrewbogott> shutting down and removing tools-k8s-worker-20 through tools-k8s-worker-29; this load can now be handled by new nodes on ceph hosts [tools]
21:15 <andrewbogott> shutting down and removing tools-k8s-worker-1 through tools-k8s-worker-19; this load can now be handled by new nodes on ceph hosts [tools]
18:40 <andrewbogott> creating 13 new xlarge k8s worker nodes, tools-k8s-worker-67 through tools-k8s-worker-79 [tools]
2020-08-18 §
15:24 <bd808> Rebuilding all Docker containers to pick up newest versions of installed packages [tools]
2020-07-30 §
16:28 <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663 [tools]
2020-07-29 §
23:24 <bd808> Pushed a copy of docker-registry.wikimedia.org/wikimedia-jessie:latest to docker-registry.tools.wmflabs.org/wikimedia-jessie:latest in preparation for the upstream image going away [tools]
2020-07-24 §
22:33 <bd808> Removed a few more ancient docker images: grrrit, jessie-toollabs, and nagf [tools]
21:02 <bd808> Running cleanup script to delete the non-sssd toolforge images from docker-registry.tools.wmflabs.org [tools]
20:17 <bd808> Forced garbage collection on docker-registry.tools.wmflabs.org [tools]
20:06 <bd808> Running cleanup script to delete all of the old toollabs-* images from docker-registry.tools.wmflabs.org [tools]
2020-07-22 §
23:24 <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663 [tools]
23:22 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[56-60] T257945 [tools]
23:17 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[41-55] T257945 [tools]
23:14 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[21-40] T257945 [tools]
23:11 <bstorm> running puppet and NFS remount on tools-k8s-worker-[1-15] T257945 [tools]
23:07 <bstorm> disabling puppet on k8s workers to reduce the effect of changing the NFS mount version all at once T257945 [tools]
22:28 <bstorm> setting tools-k8s-control prefix to mount NFS v4.2 T257945 [tools]
22:15 <bstorm> set the tools-k8s-control nodes to also use 800MBps to prevent issues with toolforge ingress and api system [tools]
22:07 <bstorm> set the tools-k8s-haproxy-1 (main load balancer for toolforge) to have an egress limit of 800MB per sec instead of the same as all the other servers [tools]
2020-07-21 §
16:09 <bstorm> rebooting tools-sgegrid-shadow to remount NFS correctly [tools]
15:55 <bstorm> set the bastion prefix to have explicitly set hiera value of profile::wmcs::nfsclient::nfs_version: '4' [tools]
2020-07-17 §
16:47 <bd808> Enabled Puppet on tools-proxy-06 following successful test (T102367) [tools]
16:29 <bd808> Disabled Puppet on tools-proxy-06 to test nginx config changes manually (T102367) [tools]
2020-07-15 §
23:11 <bd808> Removed ssh root key for valhallasw from project hiera (T255697) [tools]
2020-07-09 §
18:53 <bd808> Updating git-review to 1.27 via clush across cluster (T257496) [tools]