701-750 of 3090 results (17ms)
2020-01-28 §
13:35 <arturo> `aborrero@tools-clushmaster-02:~$ clush -w @exec-stretch 'for i in $(ps aux | grep [t]ools.j | awk -F" " "{print \$2}") ; do echo "killing $i" ; sudo kill $i ; done || true'` (T243831) [tools]
2020-01-27 §
07:05 <zhuyifei1999_> wrong package. uninstalled. the correct one is bpfcc-tools and seems only available in buster+. T115231 [tools]
07:01 <zhuyifei1999_> apt installing bcc on tools-worker-1037 to see who is sending SIGTERM, will uninstall after done. dependency: bin86. T115231 [tools]
2020-01-24 §
20:58 <bd808> Built tools-k8s-worker-21 to test out build script following openstack client upgrade [tools]
15:45 <bd808> Rebuilding all Docker containers again because I failed to actually update the build server git clone properly last time I did this [tools]
05:23 <bd808> Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster (take 2) [tools]
04:41 <bd808> Rebuilding all Docker images to pick up webservice-python-bootstrap changes [tools]
2020-01-23 §
23:38 <bd808> Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes [tools]
23:16 <bd808> Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster [tools]
05:15 <bd808> Building tools-elastic-04 [tools]
04:39 <bd808> wmcs-openstack quota set --instances 192 [tools]
04:36 <bd808> wmcs-openstack quota set --cores 768 --ram 1536000 [tools]
2020-01-22 §
12:43 <arturo> for the record, issue with tools-worker-1016 was memory exhaustion apparently [tools]
12:35 <arturo> hard-reboot tools-worker-1016 (not responding to even console access) [tools]
2020-01-21 §
19:25 <bstorm_> hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network [tools]
19:17 <bstorm_> depooled and rebooted tools-sgeexec-0914 because it was acting funny [tools]
18:30 <bstorm_> depooling and rebooting tools-sgeexec-[0911,0913,0919,0921,0924,0931,0933,0935,0939,0941].tools.eqiad.wmflabs [tools]
17:21 <bstorm_> rebooting toolschecker to recover stale nfs handle [tools]
2020-01-16 §
23:54 <bstorm_> rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly [tools]
23:45 <bstorm_> rebuilding docker containers to include new webservice version (0.58) [tools]
23:41 <bstorm_> deployed toollabs-webservice 0.58 to everything that isn't a container [tools]
16:45 <bstorm_> ran configurator to set the gridengine web queues to `rerun FALSE` T242397 [tools]
2020-01-14 §
15:29 <bstorm_> failed the gridengine master back to the master server from the shadow [tools]
02:23 <andrewbogott> rebooting tools-paws-worker-1006 to resolve hangs associated with an old NFS failure [tools]
2020-01-13 §
17:48 <bd808> Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642) [tools]
16:42 <bd808> Cordoned and fixed puppet on tools-k8s-worker-12. Rebooting now. T242559 [tools]
16:33 <bd808> Cordoned and fixed puppet on tools-k8s-worker-11. Rebooting now. T242559 [tools]
16:31 <bd808> Cordoned and fixed puppet on tools-k8s-worker-10. Rebooting now. T242559 [tools]
16:26 <bd808> Cordoned and fixed puppet on tools-k8s-worker-9. Rebooting now. T242559 [tools]
2020-01-12 §
22:31 <Krenair> same on -13 and -14 [tools]
22:28 <Krenair> same on -8 [tools]
22:18 <Krenair> same on -7 [tools]
22:11 <Krenair> Did usual new instance creation puppet dance on tools-k8s-worker-6, /data/project got created [tools]
2020-01-11 §
01:33 <bstorm_> updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos. [tools]
2020-01-10 §
23:31 <bstorm_> updated toollabs-webservice package to 0.56 [tools]
15:45 <bstorm_> depooled tools-paws-worker-1013 to reboot because I think it is the last tools server with that mount issue (I hope) [tools]
15:35 <bstorm_> depooling and rebooting tools-worker-1016 because it still had the leftover mount problems [tools]
15:30 <bstorm_> git stash-ing local puppet changes in hopes that arturo has that material locally, and it doesn't break anything to do so [tools]
2020-01-09 §
23:35 <bstorm_> depooled tools-sgeexec-0939 because it isn't acting right and rebooting it [tools]
18:26 <bstorm_> re-joining the k8s nodes OF THE PAWS CLUSTER to the cluster one at a time to rotate the certs T242353 [tools]
18:25 <bstorm_> re-joining the k8s nodes to the cluster one at a time to rotate the certs T242353 [tools]
18:06 <bstorm_> rebooting tools-paws-master-01 T242353 [tools]
17:46 <bstorm_> refreshing the paws cluster's entire x509 environment T242353 [tools]
2020-01-07 §
22:40 <bstorm_> rebooted tools-worker-1007 to recover it from disk full and general badness [tools]
16:33 <arturo> deleted by hand pod metrics/cadvisor-5pd46 due to prometheus having issues scrapping it [tools]
15:46 <bd808> Rebooting tools-k8s-worker-[6-14] [tools]
15:35 <bstorm_> changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067 [tools]
14:02 <arturo> `root@tools-k8s-control-3:~# wmcs-k8s-secret-for-cert -n metrics -s metrics-server-certs -a metrics-server` (T241853) [tools]
13:33 <arturo> upload docker-registry.tools.wmflabs.org/coreos/kube-state-metrics:v1.8.0 copied from quay.io/coreos/kube-state-metrics:v1.8.0 (T241853) [tools]
13:31 <arturo> upload docker-registry.tools.wmflabs.org/metrics-server-amd64:v0.3.6 copied from k8s.gcr.io/metrics-server-amd64:v0.3.6 (T241853) [tools]