701-750 of 3613 results (23ms)
2021-03-04 §
11:25 <arturo> rebooted tools-sgewebgrid-generic-0901, repool it again [tools]
09:57 <arturo> depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022 [tools]
2021-03-03 §
15:17 <arturo> shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration [tools]
15:11 <arturo> tools-sgebastion-07 triggered a neutron exception (unauthorized) while being live-migrated from cloudvirt1021 to 1029. Resetting nova state with `nova reset-state bd685d48-1011-404e-a755-372f6022f345 --active` and try again [tools]
14:48 <arturo> killed pywikibot instance running in tools-sgebastion-07 by user msyn [tools]
2021-03-02 §
15:23 <bstorm> depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right [tools]
15:22 <bstorm> cleared queue error states...will need to keep a better eye on what's causing those [tools]
2021-02-27 §
02:23 <bstorm> deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910 [tools]
02:00 <bstorm> running a script to repair the dumps mount in all podpresets T275371 [tools]
2021-02-26 §
22:04 <bstorm> cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513 [tools]
21:27 <bstorm> hard rebooting tools-sgeexec-0947 [tools]
21:21 <bstorm> hard rebooting tools-sgeexec-0952.tools.eqiad.wmflabs [tools]
20:01 <bd808> Deleted csr in strange state for tool-ores-inspect [tools]
2021-02-24 §
18:30 <bd808> `sudo wmcs-openstack role remove --user zfilipin --project tools user` T267313 [tools]
01:04 <bstorm> hard rebooting tools-k8s-worker-76 because it's in a sorry state [tools]
2021-02-23 §
23:11 <bstorm> draining a bunch of k8s workers to clean up after dumps changes T272397 [tools]
23:06 <bstorm> draining tools-k8s-worker-55 to clean up after dumps changes T272397 [tools]
2021-02-22 §
20:40 <bstorm> repooled tools-sgeexec-0918.tools.eqiad.wmflabs [tools]
19:09 <bstorm> hard rebooted tools-sgeexec-0918 from openstack T275411 [tools]
19:07 <bstorm> shutting down tools-sgeexec-0918 with the VM's command line (not libvirt directly yet) T275411 [tools]
19:05 <bstorm> shutting down tools-sgeexec-0918 (with openstack to see what happens) T275411 [tools]
19:03 <bstorm> depooled tools-sgeexec-0918 T275411 [tools]
18:56 <bstorm> deleted job 1962508 from the grid to clear it up T275301 [tools]
16:58 <bstorm> cleared error state on several grid queues [tools]
2021-02-19 §
12:31 <arturo> deploying new version of toolforge ingress admission controller [tools]
2021-02-17 §
21:26 <bstorm> deleted tools-puppetdb-01 since it is unused at this time (and undersized anyway) [tools]
2021-02-04 §
16:27 <bstorm> rebooting tools-package-builder-02 [tools]
2021-01-26 §
16:27 <bd808> Hard reboot of tools-sgeexec-0906 via Horizon for T272978 [tools]
2021-01-22 §
09:59 <dcaro> added the record redis.svc.tools.eqiad1.wikimedia.cloud pointing to tools-redis1003 (T272679) [tools]
2021-01-21 §
23:58 <bstorm> deployed new maintain-kubeusers to tools T271847 [tools]
2021-01-19 §
22:57 <bstorm> truncated 75GB error log /data/project/robokobot/virgule.err T272247 [tools]
22:48 <bstorm> truncated 100GB error log /data/project/magnus-toolserver/error.log T272247 [tools]
22:43 <bstorm> truncated 107GB log '/data/project/meetbot/logs/messages.log' T272247 [tools]
22:34 <bstorm> truncating 194 GB error log '/data/project/mix-n-match/mnm-microsync.err' T272247 [tools]
16:37 <bd808> Added Jhernandez to root sudoers group [tools]
2021-01-14 §
20:56 <bstorm> setting bastions to have mostly-uncapped egress network and 40MBps nfs_read for better shared use [tools]
20:43 <bstorm> running tc-setup across the k8s workers [tools]
20:40 <bstorm> running tc-setup across the grid fleet [tools]
17:58 <bstorm> hard rebooting tools-sgecron-01 following network issues during upgrade to stein T261134 [tools]
2021-01-13 §
10:02 <arturo> delete floating IP allocation 185.15.56.245 (T271867) [tools]
2021-01-12 §
18:16 <bstorm> deleted wedged CSR tool-adhs-wde to get maintain-kubeusers working again T271842 [tools]
2021-01-05 §
18:49 <bstorm> changing the limits on k8s etcd nodes again, so disabling puppet on them T267966 [tools]
2021-01-04 §
18:21 <bstorm> ran 'sudo systemctl stop getty@ttyS1.service && sudo systemctl disable getty@ttyS1.service' on tools-k8s-etcd-5 I have no idea why that keeps coming back. [tools]
2020-12-22 §
18:22 <bstorm> rebooting the grid master because it is misbehaving following the NFS outage [tools]
10:53 <arturo> rebase & resolve ugly git merge conflict in labs/private.git [tools]
2020-12-18 §
18:37 <bstorm> set profile::wmcs::kubeadm::etcd_latency_ms: 15 T267966 [tools]
2020-12-17 §
21:42 <bstorm> doing the same procedure to increase the timeouts more T267966 [tools]
19:56 <bstorm> puppet enabled one at a time, letting things catch up. Timeouts are now adjusted to something closer to fsync values T267966 [tools]
19:44 <bstorm> set etcd timeouts seed value to 20 instead of the default 10 (profile::wmcs::kubeadm::etcd_latency_ms) T267966 [tools]
18:58 <bstorm> disabling puppet on k8s-etcd servers to alter the timeouts T267966 [tools]