751-800 of 3694 results (26ms)
2021-03-24 §
11:20 <arturo> created 80G cinder volume tools-docker-registry-data (T278303) [tools]
11:10 <arturo> starting VM tools-docker-registry-04 which was stopped probably since 2021-03-09 due to hypervisor draining [tools]
2021-03-23 §
12:46 <arturo> aborrero@tools-sgegrid-master:~$ sudo systemctl restart gridengine-master.service [tools]
12:15 <arturo> delete & re-create VM tools-sgegrid-shadow as Debian Buster (T277653) [tools]
12:14 <arturo> created puppet prefix 'tools-sgegrid-shadow' and migrated puppet configuration from VM-puppet [tools]
12:13 <arturo> created server group 'tools-grid-master-shadow' with anty-affinity policy [tools]
2021-03-18 §
19:24 <bstorm> set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes [tools]
16:21 <andrewbogott> enabling puppet tools-wide [tools]
16:20 <andrewbogott> disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456 [tools]
16:19 <bstorm> added profile::toolforge::infrastructure class to puppetmaster T277756 [tools]
04:12 <bstorm> rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight [tools]
03:59 <bstorm> rebooting grid master. sorry for the cron spam [tools]
03:49 <bstorm> restarting sssd on tools-sgegrid-master [tools]
03:37 <bstorm> deleted a massive number of stuck jobs that misfired from the cron server [tools]
03:35 <bstorm> rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it [tools]
01:46 <bstorm> killed the toolschecker cron job, which had an LDAP error, and ran it again by hand [tools]
2021-03-17 §
20:57 <bstorm> deployed changes to rbac for kubernetes to add kubectl top access for tools [tools]
20:26 <andrewbogott> moving tools-elastic-3 to cloudvirt1034; two elastic nodes shouldn't be on the same hv [tools]
2021-03-16 §
16:31 <arturo> installing jobutils and misctools 1.41 [tools]
15:55 <bstorm> deleted a bunch of messed up grid jobs (9989481,8813,81682,86317,122602,122623,583621,606945,606999) [tools]
12:32 <arturo> add packages jobutils / misctools v1.41 to {stretch,buster}-tools aptly repository in tools-sge-services-03 [tools]
2021-03-12 §
23:13 <bstorm> cleared error state for all grid queues [tools]
2021-03-11 §
17:40 <bstorm> deployed metrics-server:0.4.1 to kubernetes [tools]
16:21 <bstorm> add jobutils 1.40 and misctools 1.40 to stretch-tools [tools]
13:11 <arturo> add misctools 1.37 to buster-tools|toolsbeta aptly repo for T275865 [tools]
13:10 <arturo> add jobutils 1.40 to buster-tools aptly repo for T275865 [tools]
2021-03-10 §
10:56 <arturo> briefly stopped VM tools-k8s-etcd-7 to disable VMX cpu flag [tools]
2021-03-09 §
13:31 <arturo> hard-reboot tools-docker-registry-04 because issues related to T276922 [tools]
12:34 <arturo> briefly rebooting VM tools-docker-registry-04, we need to reboot the hypervisor cloudvirt1038 and failed to migrate away [tools]
2021-03-05 §
12:30 <arturo> started tools-redis-1004 again [tools]
12:22 <arturo> stop tools-redis-1004 to ease draining of cloudvirt1035 [tools]
2021-03-04 §
11:25 <arturo> rebooted tools-sgewebgrid-generic-0901, repool it again [tools]
09:57 <arturo> depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022 [tools]
2021-03-03 §
15:17 <arturo> shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration [tools]
15:11 <arturo> tools-sgebastion-07 triggered a neutron exception (unauthorized) while being live-migrated from cloudvirt1021 to 1029. Resetting nova state with `nova reset-state bd685d48-1011-404e-a755-372f6022f345 --active` and try again [tools]
14:48 <arturo> killed pywikibot instance running in tools-sgebastion-07 by user msyn [tools]
2021-03-02 §
15:23 <bstorm> depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right [tools]
15:22 <bstorm> cleared queue error states...will need to keep a better eye on what's causing those [tools]
2021-02-27 §
02:23 <bstorm> deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910 [tools]
02:00 <bstorm> running a script to repair the dumps mount in all podpresets T275371 [tools]
2021-02-26 §
22:04 <bstorm> cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513 [tools]
21:27 <bstorm> hard rebooting tools-sgeexec-0947 [tools]
21:21 <bstorm> hard rebooting tools-sgeexec-0952.tools.eqiad.wmflabs [tools]
20:01 <bd808> Deleted csr in strange state for tool-ores-inspect [tools]
2021-02-24 §
18:30 <bd808> `sudo wmcs-openstack role remove --user zfilipin --project tools user` T267313 [tools]
01:04 <bstorm> hard rebooting tools-k8s-worker-76 because it's in a sorry state [tools]
2021-02-23 §
23:11 <bstorm> draining a bunch of k8s workers to clean up after dumps changes T272397 [tools]
23:06 <bstorm> draining tools-k8s-worker-55 to clean up after dumps changes T272397 [tools]
2021-02-22 §
20:40 <bstorm> repooled tools-sgeexec-0918.tools.eqiad.wmflabs [tools]
19:09 <bstorm> hard rebooted tools-sgeexec-0918 from openstack T275411 [tools]