1-50 of 2383 results (8ms)
2020-01-23 §
23:38 <bd808> Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes [tools]
23:16 <bd808> Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster [tools]
05:15 <bd808> Building tools-elastic-04 [tools]
04:39 <bd808> wmcs-openstack quota set --instances 192 [tools]
04:36 <bd808> wmcs-openstack quota set --cores 768 --ram 1536000 [tools]
2020-01-22 §
12:43 <arturo> for the record, issue with tools-worker-1016 was memory exhaustion apparently [tools]
12:35 <arturo> hard-reboot tools-worker-1016 (not responding to even console access) [tools]
2020-01-21 §
19:25 <bstorm_> hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network [tools]
19:17 <bstorm_> depooled and rebooted tools-sgeexec-0914 because it was acting funny [tools]
18:30 <bstorm_> depooling and rebooting tools-sgeexec-[0911,0913,0919,0921,0924,0931,0933,0935,0939,0941].tools.eqiad.wmflabs [tools]
17:21 <bstorm_> rebooting toolschecker to recover stale nfs handle [tools]
2020-01-16 §
23:54 <bstorm_> rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly [tools]
23:45 <bstorm_> rebuilding docker containers to include new webservice version (0.58) [tools]
23:41 <bstorm_> deployed toollabs-webservice 0.58 to everything that isn't a container [tools]
16:45 <bstorm_> ran configurator to set the gridengine web queues to `rerun FALSE` T242397 [tools]
2020-01-14 §
15:29 <bstorm_> failed the gridengine master back to the master server from the shadow [tools]
02:23 <andrewbogott> rebooting tools-paws-worker-1006 to resolve hangs associated with an old NFS failure [tools]
2020-01-13 §
17:48 <bd808> Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642) [tools]
16:42 <bd808> Cordoned and fixed puppet on tools-k8s-worker-12. Rebooting now. T242559 [tools]
16:33 <bd808> Cordoned and fixed puppet on tools-k8s-worker-11. Rebooting now. T242559 [tools]
16:31 <bd808> Cordoned and fixed puppet on tools-k8s-worker-10. Rebooting now. T242559 [tools]
16:26 <bd808> Cordoned and fixed puppet on tools-k8s-worker-9. Rebooting now. T242559 [tools]
2020-01-12 §
22:31 <Krenair> same on -13 and -14 [tools]
22:28 <Krenair> same on -8 [tools]
22:18 <Krenair> same on -7 [tools]
22:11 <Krenair> Did usual new instance creation puppet dance on tools-k8s-worker-6, /data/project got created [tools]
2020-01-11 §
01:33 <bstorm_> updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos. [tools]
2020-01-10 §
23:31 <bstorm_> updated toollabs-webservice package to 0.56 [tools]
15:45 <bstorm_> depooled tools-paws-worker-1013 to reboot because I think it is the last tools server with that mount issue (I hope) [tools]
15:35 <bstorm_> depooling and rebooting tools-worker-1016 because it still had the leftover mount problems [tools]
15:30 <bstorm_> git stash-ing local puppet changes in hopes that arturo has that material locally, and it doesn't break anything to do so [tools]
2020-01-09 §
23:35 <bstorm_> depooled tools-sgeexec-0939 because it isn't acting right and rebooting it [tools]
18:26 <bstorm_> re-joining the k8s nodes OF THE PAWS CLUSTER to the cluster one at a time to rotate the certs T242353 [tools]
18:25 <bstorm_> re-joining the k8s nodes to the cluster one at a time to rotate the certs T242353 [tools]
18:06 <bstorm_> rebooting tools-paws-master-01 T242353 [tools]
17:46 <bstorm_> refreshing the paws cluster's entire x509 environment T242353 [tools]
2020-01-07 §
22:40 <bstorm_> rebooted tools-worker-1007 to recover it from disk full and general badness [tools]
16:33 <arturo> deleted by hand pod metrics/cadvisor-5pd46 due to prometheus having issues scrapping it [tools]
15:46 <bd808> Rebooting tools-k8s-worker-[6-14] [tools]
15:35 <bstorm_> changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067 [tools]
14:02 <arturo> `root@tools-k8s-control-3:~# wmcs-k8s-secret-for-cert -n metrics -s metrics-server-certs -a metrics-server` (T241853) [tools]
13:33 <arturo> upload docker-registry.tools.wmflabs.org/coreos/kube-state-metrics:v1.8.0 copied from quay.io/coreos/kube-state-metrics:v1.8.0 (T241853) [tools]
13:31 <arturo> upload docker-registry.tools.wmflabs.org/metrics-server-amd64:v0.3.6 copied from k8s.gcr.io/metrics-server-amd64:v0.3.6 (T241853) [tools]
13:23 <arturo> [new k8s] doing changes to kube-state-metrics and metrics-server trying to relocate them to the 'metrics' namespace (T241853) [tools]
05:28 <bd808> Creating tools-k8s-worker-[6-14] (again) [tools]
05:20 <bd808> Deleting busted tools-k8s-worker-[6-14] [tools]
05:02 <bd808> Creating tools-k8s-worker-[6-14] [tools]
00:26 <bstorm_> repooled tools-sgewebgrid-lighttpd-0919 [tools]
00:17 <bstorm_> repooled tools-sgewebgrid-lighttpd-0918 [tools]
00:15 <bstorm_> moving tools-sgewebgrid-lighttpd-0918 and -0919 to cloudvirt1004 from cloudvirt1029 to rebalance load [tools]