1-50 of 3039 results (11ms)
2021-05-03 §
16:23 <dcaro> started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641) [tools]
14:07 <dcaro> depooling tols-sgeexec-0908/7 to be able to restart the VMs as they got stuck during migration (T280641) [tools]
2021-04-29 §
18:23 <bstorm> removing one more etcd node via cookbook T279723 [tools]
18:12 <bstorm> removing an etcd node via cookbook T279723 [tools]
2021-04-27 §
16:40 <bstorm> deleted all the errored out grid jobs stuck in queue wait [tools]
16:16 <bstorm> cleared E status on grid queues to get things flowing again [tools]
2021-04-26 §
12:17 <arturo> allowing more tools into the legacy redirector (T281003) [tools]
2021-04-22 §
08:44 <Krenair> Removed yuvipanda from roots sudo policy [tools]
08:42 <Krenair> Removed yuvipanda from projectadmin per request [tools]
08:40 <Krenair> Removed yuvipanda from tools.admin per request [tools]
2021-04-20 §
22:20 <bd808> `clush -w @all -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"` [tools]
22:19 <bd808> `clush -w @exec -b "sudo exiqgrep -z -i | xargs sudo exim -Mt"` [tools]
21:52 <bd808> Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad1.wikimedia.cloud`. Was using wrong domain name in prior update. [tools]
21:49 <bstorm> tagged the latest maintain-kubeusers and deployed to toolforge (with kustomize changes to rbac) after testing in toolsbeta T280300 [tools]
21:27 <bd808> Update hiera `profile::toolforge::active_mail_relay: tools-mail-03.tools.eqiad.wmflabs`. was -2 which is decommed. [tools]
10:18 <dcaro> seting the retention on the tools-prometheus VMs to 250GB (they have 276GB total, leaving some space for online data operations if needed) (T279990) [tools]
2021-04-19 §
10:53 <dcaro> reverting setting prometheus data source in grafana to 'server', can't connect, [tools]
10:51 <dcaro> setting prometheus data source in grafana to 'server' to avoid CORS issues [tools]
2021-04-16 §
23:15 <bstorm> cleaned up all source files for the grid with the old domain name to enable future node creation T277653 [tools]
14:38 <dcaro> added 'will get out of space in X days' panel to the dasboard https://grafana-labs.wikimedia.org/goto/kBlGd0uGk (T279990), we got <5days xd [tools]
11:35 <arturo> running `grid-configurator --all-domains` which basically added tools-sgebastion-10,11 as submit hosts and removed tools-sgegrid-master,shadow as submit hosts [tools]
2021-04-15 §
17:45 <bstorm> cleared error state from tools-sgeexec-0920.tools.eqiad.wmflabs for a failed job [tools]
2021-04-13 §
13:26 <dcaro> upgrade puppet and python-wmflib on tools-prometheus-03 [tools]
11:23 <arturo> deleted shutoff VM tools-package-builder-02 (T275864) [tools]
11:21 <arturo> deleted shutoff VM tools-sge-services-03,04 (T278354) [tools]
11:20 <arturo> deleted shutoff VM tools-docker-registry-03,04 (T278303) [tools]
11:18 <arturo> deleted shutoff VM tools-mail-02 (T278538) [tools]
11:17 <arturo> deleted shutoff VMs tools-static-12,13 (T278539) [tools]
2021-04-11 §
16:07 <bstorm> cleared E state from tools-sgeexec-0917 tools-sgeexec-0933 tools-sgeexec-0934 tools-sgeexec-0937 from failures of jobs 761759, 815031, 815056, 855676, 898936 [tools]
2021-04-08 §
18:25 <bstorm> cleaned up the deprecated entries in /data/project/.system_sge/gridengine/etc/submithosts for tools-sgegrid-master and tools-sgegrid-shadow using the old fqdns T277653 [tools]
09:24 <arturo> allocate & associate floating IP 185.15.56.122 for tools-sgebastion-11, also with DNS A record `dev-buster.toolforge.org` (T275865) [tools]
09:22 <arturo> create DNS A record `login-buster.toolforge.org` pointing to 185.15.56.66 (tools-sgebastion-10) (T275865) [tools]
09:20 <arturo> associate floating IP 185.15.56.66 to tools-sgebastion-10 (T275865) [tools]
09:12 <arturo> created tools-sgebastion-11 (buster) (T275865) [tools]
2021-04-07 §
04:35 <andrewbogott> replacing the mx record '10 mail.tools.wmcloud.org' with '10 mail.tools.wmcloud.org.' — trying to fix axfr for the tools.wmcloud.org zone [tools]
2021-04-06 §
15:16 <bstorm> cleared queue state since a few had "errored" for failed jobs. [tools]
12:59 <dcaro> Removing etcd member tools-k8s-etcd-7.tools.eqiad1.wikimedia.cloud to get an odd number (T267082) [tools]
11:45 <arturo> upgrading jobutils & misctools to 1.42 everywhere [tools]
11:39 <arturo> cleaning up aptly: old package versions, old repos (jessie, trusty, precise) etc [tools]
10:31 <dcaro> Removing etcd member tools-k8s-etcd-6.tools.eqiad.wmflabs (T267082) [tools]
10:21 <arturo> published jobutils & misctools 1.42 (T278748) [tools]
10:21 <arturo> published jobutils & misctools 1.42 [tools]
10:21 <arturo> aptly repo had some weirdness due to the cinder volume: hardlinks created by aptly were broken, solved with `sudo aptly publish --skip-signing repo stretch-tools -force-overwrite` [tools]
10:07 <dcaro> adding new etcd member using the cookbook wmcs.toolforge.add_etcd_node (T267082) [tools]
10:05 <arturo> installed aptly from buster-backports on tools-services-05 to see if that makes any difference with an issue when publishing repos [tools]
09:53 <dcaro> Removing etcd member tools-k8s-etcd-4.tools.eqiad.wmflabs (T267082) [tools]
08:55 <dcaro> adding new etcd member using the cookbook wmcs.toolforge.add_etcd_node (T267082) [tools]
2021-04-05 §
17:02 <bstorm> chowned the data volume for the docker registry to docker-registry:docker-registry [tools]
09:56 <arturo> make jhernandez (IRC joakino) projectadmin (T278975) [tools]
2021-04-01 §
20:43 <bstorm> cleared error state from the grid queues caused by unspecified job errors [tools]