2023-05-15 §
16:52 <dcaro> rebooting tools-sgeexec-10-22 (T316544) [tools]
16:51 <dcaro> rebooting tools-sgeweblight-10-28 (T316544) [tools]
16:50 <dcaro> rebooting tools-sgeexec-10-17 (T316544) [tools]
16:48 <dcaro> rebooting tools-sgeexec-10-21 (T316544) [tools]
16:47 <dcaro> rebooting tools-sgeexec-10-19 (T316544) [tools]
16:45 <dcaro> rebooting tools-sgeexec-10-8 (T316544) [tools]
16:45 <dcaro> rebooting tools-sgeweblight-10-24 (T316544) [tools]
16:44 <dcaro> rebooting tools-sgewebgen-10-2 (T316544) [tools]
16:44 <dcaro> rebooting tools-sgeweblight-10-16 (T316544) [tools]
16:43 <dcaro> rebooting tools-sgeweblight-10-30 (T316544) [tools]
16:43 <dcaro> rebooting tools-sgeexec-10-18 (T316544) [tools]
16:42 <dcaro> rebooting tools-sgeexec-10-16 (T316544) [tools]
16:42 <dcaro> rebooting tools-sgeexec-10-14 (T316544) [tools]
16:41 <dcaro> rebooting tools-sgeweblight-10-32 (T316544) [tools]
16:40 <dcaro> rebooting tools-sgeweblight-10-22 (T316544) [tools]
16:39 <dcaro> rebooting tools-sgeweblight-10-17 (T316544) [tools]
16:32 <dcaro> rebooting tools-sgeexec-10-13.tools.eqiad1.wikimedia.cloud (T316544) [tools]
16:23 <dcaro> rebooting tools-sgeweblight-10-26 (T316544) [tools]
16:15 <bd808> Hard reboot of tools-sgebastion-11 via Horizon (done circa 16:11Z) [tools]
16:14 <arturo> rebooted a bunch of nodes to cleanup D procs and high load avg because NFS outage (result of T316544) [tools]
12:20 <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/builds-api:09f3b49-dev from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api.git (32a8ae9) - cookbook ran by dcaro@vulcanus [tools]
09:11 <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/volume-admission:c64da5a from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller (c64da5a) - cookbook ran by dcaro@vulcanus [tools]
2023-05-13 §
09:13 <taavi> reboot tools-sgeexec-10-15,17,18,21 [tools]
2023-05-11 §
15:48 <bd808> Rebooted tools-sgebastion-10 for T336510 [tools]
15:31 <bd808> Sent `wall` for reboot of tools-sgebastion-10 circa 15:40Z [tools]
2023-05-09 §
16:36 <taavi> delegated beta.toolforge.org domain to toolsbeta per T257386 [tools]
09:35 <wm-bot2> deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api (ad4fa2a) - cookbook ran by taavi@runko [tools]
2023-05-08 §
09:11 <arturo> force-reboot tools-sgeexec-10-13 (reported as down by the monitoring, no SSH) [tools]
2023-05-07 §
16:06 <taavi> remove inbound 25/tcp rule from the toolserver legacy server T136225 [tools]
2023-05-05 §
22:21 <bd808> Added "RepoLookoutBot" to hiera key "dynamicproxy::blocked_user_agent_regex" to stop unnecessary scans by https://www.repo-lookout.org/ [tools]
22:20 <bd808> Added [tools]
11:30 <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:811164e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api (811164e) - cookbook ran by taavi@runko [tools]
09:13 <dcaro> rebooted tools-sgeexec-10-16 as it was stuck (T335009) [tools]
2023-05-04 §
15:15 <wm-bot2> removed instance tools-k8s-etcd-15 - cookbook ran by andrew@bullseye [tools]
14:13 <wm-bot2> removed instance tools-k8s-etcd-14 - cookbook ran by andrew@bullseye [tools]
2023-05-03 §
12:41 <wm-bot2> removed instance tools-k8s-etcd-13 - cookbook ran by andrew@bullseye [tools]
2023-05-02 §
00:29 <wm-bot2> deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller (7199a9e) - cookbook ran by raymond@ubuntu [tools]
2023-05-01 §
23:17 <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:3b3803f from https://github.com/toolforge/buildpack-admission-controller (3b3803f) - cookbook ran by raymond@ubuntu [tools]
2023-04-28 §
15:01 <arturo> force reboot tools-k8s-worker-79, unresponsive [tools]
08:27 <dcaro> rebooting tools-sgeweblight-10-28 (T335336) [tools]
07:20 <dcaro> rebooting tools-sgegrid-shadow due to stale nfs mount [tools]
00:09 <bd808> `kubectl uncordon tools-k8s-worker-67` (T335543) [tools]
00:07 <bd808> Hard reboot tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud via horizon (T335543) [tools]
00:04 <bd808> Rebooting tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud (T335543) [tools]
2023-04-27 §
23:59 <bd808> `kubectl drain --ignore-daemonsets --delete-emptydir-data --force tools-k8s-worker-67` (T335543) [tools]
20:50 <bd808> Started process to rebuild all buster and bullseye based container images again. Prior problem seems to have been stale images in local cache on the build server. [tools]
20:42 <bd808> Container image rebuild failed with GPG errors in buster-sssd base image. Will investigate and attempt to restart once resolved in a local dev environment. [tools]
20:33 <bd808> Started process to rebuild all buster and bullseye based container images per https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Building_toolforge_specific_images [tools]
2023-04-18 §
16:46 <dcaro> force-rebooting tools-sgeweblight-10-25/26/27 as they got stuck stopping the grid_exec process [tools]
16:35 <dcaro> rebooting root@tools-sgeweblight-10-27 due to stuck exec daemon not releasing port 6445 [tools]