951-1000 of 3352 results (9ms)
2020-01-31 §
14:00 <arturo> syncing again prometheus data from tools-prometheus-01 to tools-prometheus-0{3,4} due to some inconsistencies preventing prometheus from starting (T238096) [tools]
2020-01-30 §
21:04 <andrewbogott> also apt-get install python3-novaclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/ [tools]
20:39 <andrewbogott> apt-get install python3-keystoneclient on tools-prometheus-03 and tools-prometheus-04 to suppress cronspam. Possible real fix for this is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569084/ [tools]
16:27 <arturo> create VM tools-prometheus-04 as cold standby of tools-prometheus-03 (T238096) [tools]
16:25 <arturo> point tools-prometheus.wmflabs.org proxy to tools-prometheus-03 (T238096) [tools]
13:42 <arturo> disable puppet in prometheus servers while syncing metric data (T238096) [tools]
13:14 <arturo> drop floating IP 185.15.56.60 and FQDN `prometheus.tools.wmcloud.org` because this is not how the prometheus setup is right now. Use a web proxy instead `tools-prometheus-new.wmflabs.org` (T238096) [tools]
13:09 <arturo> created FQDN `prometheus.tools.wmcloud.org` pointing to IPv4 185.15.56.60 (tools-prometheus-03) to test T238096 [tools]
12:59 <arturo> associated floating IPv4 185.15.56.60 to tools-prometheus-03 (T238096) [tools]
12:57 <arturo> created domain `tools.wmcloud.org` in the tools project after some back and forth with designated, permissions and the database. I plan to use this domain to test the new Debian Buster-based prometheus setup (T238096) [tools]
10:20 <arturo> create new VM instance tools-prometheus-03 (T238096) [tools]
2020-01-29 §
20:07 <bd808> Created {bastion,login,dev}.toolforge.org service names for Toolforge bastions using Horizon & Designate [tools]
2020-01-28 §
13:35 <arturo> `aborrero@tools-clushmaster-02:~$ clush -w @exec-stretch 'for i in $(ps aux | grep [t]ools.j | awk -F" " "{print \$2}") ; do echo "killing $i" ; sudo kill $i ; done || true'` (T243831) [tools]
2020-01-27 §
07:05 <zhuyifei1999_> wrong package. uninstalled. the correct one is bpfcc-tools and seems only available in buster+. T115231 [tools]
07:01 <zhuyifei1999_> apt installing bcc on tools-worker-1037 to see who is sending SIGTERM, will uninstall after done. dependency: bin86. T115231 [tools]
2020-01-24 §
20:58 <bd808> Built tools-k8s-worker-21 to test out build script following openstack client upgrade [tools]
15:45 <bd808> Rebuilding all Docker containers again because I failed to actually update the build server git clone properly last time I did this [tools]
05:23 <bd808> Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster (take 2) [tools]
04:41 <bd808> Rebuilding all Docker images to pick up webservice-python-bootstrap changes [tools]
2020-01-23 §
23:38 <bd808> Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes [tools]
23:16 <bd808> Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster [tools]
05:15 <bd808> Building tools-elastic-04 [tools]
04:39 <bd808> wmcs-openstack quota set --instances 192 [tools]
04:36 <bd808> wmcs-openstack quota set --cores 768 --ram 1536000 [tools]
2020-01-22 §
12:43 <arturo> for the record, issue with tools-worker-1016 was memory exhaustion apparently [tools]
12:35 <arturo> hard-reboot tools-worker-1016 (not responding to even console access) [tools]
2020-01-21 §
19:25 <bstorm_> hard rebooting tools-sgeexec-0913/14/35 because they aren't even on the network [tools]
19:17 <bstorm_> depooled and rebooted tools-sgeexec-0914 because it was acting funny [tools]
18:30 <bstorm_> depooling and rebooting tools-sgeexec-[0911,0913,0919,0921,0924,0931,0933,0935,0939,0941].tools.eqiad.wmflabs [tools]
17:21 <bstorm_> rebooting toolschecker to recover stale nfs handle [tools]
2020-01-16 §
23:54 <bstorm_> rebooting tools-docker-builder-06 because there are a couple running containers that don't want to die cleanly [tools]
23:45 <bstorm_> rebuilding docker containers to include new webservice version (0.58) [tools]
23:41 <bstorm_> deployed toollabs-webservice 0.58 to everything that isn't a container [tools]
16:45 <bstorm_> ran configurator to set the gridengine web queues to `rerun FALSE` T242397 [tools]
2020-01-14 §
15:29 <bstorm_> failed the gridengine master back to the master server from the shadow [tools]
02:23 <andrewbogott> rebooting tools-paws-worker-1006 to resolve hangs associated with an old NFS failure [tools]
2020-01-13 §
17:48 <bd808> Running `puppet ca destroy` for each unsigned cert on tools-puppetmaster-01 (T242642) [tools]
16:42 <bd808> Cordoned and fixed puppet on tools-k8s-worker-12. Rebooting now. T242559 [tools]
16:33 <bd808> Cordoned and fixed puppet on tools-k8s-worker-11. Rebooting now. T242559 [tools]
16:31 <bd808> Cordoned and fixed puppet on tools-k8s-worker-10. Rebooting now. T242559 [tools]
16:26 <bd808> Cordoned and fixed puppet on tools-k8s-worker-9. Rebooting now. T242559 [tools]
2020-01-12 §
22:31 <Krenair> same on -13 and -14 [tools]
22:28 <Krenair> same on -8 [tools]
22:18 <Krenair> same on -7 [tools]
22:11 <Krenair> Did usual new instance creation puppet dance on tools-k8s-worker-6, /data/project got created [tools]
2020-01-11 §
01:33 <bstorm_> updated toollabs-webservice package to 0.57, which should allow persisting mem and cpu in manifests with burstable qos. [tools]
2020-01-10 §
23:31 <bstorm_> updated toollabs-webservice package to 0.56 [tools]
15:45 <bstorm_> depooled tools-paws-worker-1013 to reboot because I think it is the last tools server with that mount issue (I hope) [tools]
15:35 <bstorm_> depooling and rebooting tools-worker-1016 because it still had the leftover mount problems [tools]
15:30 <bstorm_> git stash-ing local puppet changes in hopes that arturo has that material locally, and it doesn't break anything to do so [tools]