351-400 of 1347 results (9ms)
2018-01-05 §
13:46 <andrewbogott> moving tools-webgrid-lighttpd-1419.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [tools]
05:33 <andrewbogott> migrating tools-worker-1012 to labvirt1017 (CPU load balancing) [tools]
2018-01-04 §
17:24 <andrewbogott> rebooting tools-paws-worker-1019 to verify repair of T184018 [tools]
2018-01-03 §
15:38 <bd808> Forced Puppet run on tools-services-01 [tools]
11:29 <arturo> deploy https://gerrit.wikimedia.org/r/#/c/401716/ and https://gerrit.wikimedia.org/r/394101 using clush [tools]
2017-12-31 §
02:00 <bd808> Killed some pwb.py and qacct processes running on tools-bastion-03 [tools]
2017-12-21 §
17:57 <bd808> PAWS: deleted hub-deployment pod stuck in crashloopbackoff [tools]
17:30 <bd808> PAWS: deleting hub-deployment pod. Lots of "Connection pool is full" warnings in pod logs [tools]
2017-12-19 §
21:27 <chasemp> reboot tools-paws-master-01 [tools]
18:38 <andrewbogott> rebooting tools-paws-master-01 [tools]
05:07 <andrewbogott> "service gridengine-master restart" on tools-grid-master [tools]
2017-12-18 §
12:04 <arturo> it seems jupyterhub tries to use a database which doesn't exists: [E 2017-12-18 11:59:49.896 JupyterHub app:904] Failed to connect to db: sqlite:///jupyterhub.sqlite [tools]
11:58 <arturo> The restart didn't work. I could see a lot of log lines in the hub-deployment pod with something like: 2017-12-17 04:08:17,574 WARNING Connection pool is full, discarding connection: [tools]
11:51 <arturo> the restart was with: kubectl get pod -o yaml hub-deployment-1381799904-b5g5j -n prod | kubectl replace --force -f - [tools]
11:50 <arturo> restart pod hub-deployment in paws to try to fix the 502 [tools]
2017-12-15 §
13:55 <arturo> same in tools-checker-02.tools.eqiad.wmflabs [tools]
13:54 <arturo> same in tools-exec-1415.tools.eqiad.wmflabs [tools]
13:52 <arturo> running 'sudo puppet agent -t -v' in tools-webgrid-lighttpd-1416.tools.eqiad.wmflabs since didn't update in the last run with clush [tools]
2017-12-14 §
16:58 <arturo> running clush -w @all 'sudo puppet agent --test' from tools-clushmaster-01.eqiad.wmflabs due to https://gerrit.wikimedia.org/r/#/c/394572/ being merged [tools]
2017-12-13 §
17:37 <andrewbogott> upgrading puppet packages on all VMs [tools]
00:59 <madhuvishy> Cordon and Drain tools-worker-1016 [tools]
00:47 <madhuvishy> Drain + Cordon, Reboot, Uncordon tools-workers-1018-1023, 1025-1027 [tools]
00:34 <madhuvishy> Drain + Cordon, Reboot, Uncordon tools-workers-1011, 1013-1015, 1017 [tools]
00:28 <madhuvishy> Drain + Cordon, Reboot, Uncordon tools-workers-1006-1010 [tools]
00:11 <madhuvishy> Drain + Cordon, Reboot, Uncordon tools-workers-1002-1005 [tools]
2017-12-12 §
23:29 <madhuvishy> rebooting tools-worker-1012 [tools]
18:50 <andrewbogott> rebooting tools-worker-1001 [tools]
2017-12-11 §
19:32 <bd808> git gc on tools-static-11; --aggressive was killed by system (T182604) [tools]
18:07 <andrewbogott> upgrading tools puppetmaster to v4 [tools]
17:07 <bd808> git gc --aggressive on tools-static-11 (T182604) [tools]
2017-12-01 §
15:33 <chasemp> put the weird mess of untracked files on tools puppetmaster into stash to see what breaks as they should not be there? [tools]
15:30 <chasemp> prometheus nfs collector on tools-bastion-03 [tools]
2017-11-30 §
23:23 <bd808> Hard reboot of tools-bastion-03 via Horizon [tools]
23:06 <chasemp> rebooting login.tools.wmflabs.org due to overload [tools]
2017-11-20 §
20:34 <chasemp> backup crons tools-cron-01:/var/spool/cron# cp -Rp crontabs/ /root/20112017/ [tools]
00:52 <andrewbogott> cherry-picking https://gerrit.wikimedia.org/r/#/c/392172/ onto the tools puppetmaster [tools]
2017-11-17 §
21:33 <valhallasw`cloud> also g-w'ed those files, and sent emails to all the affected users [tools]
21:17 <valhallasw`cloud> chmod o-w'ed a bunch of files reported by Dispenser; writing emails to the owners about this [tools]
2017-11-16 §
17:40 <chasemp> tools-clushmaster-01:~$ clush -w @all 'sudo puppet agent --enable && sudo puppet agent --test && sudo unattended-upgrades -d' [tools]
16:50 <bd808> Force upgraded nginx on tools-elastic-* [tools]
16:37 <chasemp> reboot tools-checker-01 [tools]
15:17 <chasemp> disable puppet [tools]
2017-11-15 §
22:48 <madhuvishy> Rebooted tools-paws-worker-1017 [tools]
15:53 <chasemp> reboot bastion-03 [tools]
15:48 <chasemp> kill tools.powow on bastion-03 for hammering IO and making bastion unusable [tools]
2017-11-07 §
01:21 <bd808> Removed all non-directory files from /home (via labstore1004 direct access) [tools]
2017-11-06 §
18:30 <bd808> Load on tools-bastion-03 down to 0.72 from 17.47 after killing a bunch of local processes that should have been running on the job grid instead [tools]
2017-11-05 §
23:48 <bd808> Cleaned up 2 huge /tmp files left by tools.croptool (~6.5G) [tools]
23:44 <bd808> Cleaned up 109 files owned by tools.rezabot on tools-webgrid-lighttpd-1428 with `sudo find /tmp -user tools.rezabot -exec rm {} \+` [tools]
23:37 <bd808> Cleaned up 955 files owned by tools.wsexport on tools-webgrid-lighttpd-1428 with `sudo find /tmp -user tools.wsexport -exec rm {} \+` [tools]