1-50 of 1251 results (11ms)
2018-03-22 §
22:04 <bd808> Forced puppet run on tools-proxy-02 for T130748 [tools]
21:52 <bd808> Forced puppet run on tools-proxy-01 for T130748 [tools]
21:48 <bd808> Disabled puppet on tools-proxy-* for https://gerrit.wikimedia.org/r/#/c/420619/ rollout [tools]
03:50 <bd808> clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete' [tools]
2018-03-21 §
17:50 <bd808> Cleaned up stale /project/.system/bigbrother.scoreboard.* files from labstore1004 [tools]
01:09 <bd808> Deleting /tmp files owned by tools.wsexport with -mtime +2 across grid (T190185) [tools]
2018-03-20 §
08:28 <zhuyifei1999_> unmount dumps & remount on tools-bastion-02 (can someone clush this?) T189018 T190126 [tools]
2018-03-19 §
11:02 <arturo> reboot tools-exec-1408, to balance load. Server is unresponsive due to high load by some tools [tools]
2018-03-16 §
22:44 <zhuyifei1999_> suspended process 22825 (BotOrderOfChapters.exe) on tools-bastion-03. Threads continuously going to D-state & R-state. Also sent message via $ write on pts/10 [tools]
12:13 <arturo> reboot tools-webgrid-lighttpd-1420 due to almost full /tmp [tools]
2018-03-15 §
16:56 <zhuyifei1999_> granted elasticsearch credentials to tools.denkmalbot T185624 [tools]
2018-03-14 §
20:57 <bd808> Upgrading elasticsearch on tools-elastic-01 (T181531) [tools]
20:53 <bd808> Upgrading elasticsearch on tools-elastic-02 (T181531) [tools]
20:51 <bd808> Upgrading elasticsearch on tools-elastic-03 (T181531) [tools]
12:07 <arturo> reboot tools-webgrid-lighttpd-1415, almost full /tmp [tools]
12:01 <arturo> repool tools-webgrid-lighttpd-1421, /tmp is now empty [tools]
11:56 <arturo> depool tools-webgrid-lighttpd-1421 for reboot due to /tmp almost full [tools]
2018-03-12 §
20:09 <madhuvishy> Run clush -w @all -b 'sudo umount /mnt/nfs/labstore1003-scratch && sudo mount -a' to remount scratch across all of tools [tools]
17:13 <arturo> T188994 upgrading packages from `stable` [tools]
16:53 <arturo> T188994 upgrading packages from stretch-wikimedia [tools]
16:33 <arturo> T188994 upgrading packages form jessie-wikimedia [tools]
14:58 <zhuyifei1999_> building, publishing, and deploying misctools 1.31 5f3561e T189430 [tools]
13:31 <arturo> tools-exec-1441 and tools-exec-1442 rebooted fine and are repooled [tools]
13:26 <arturo> depool tools-exec-1441 and tools-exec-1442 for reboots [tools]
13:19 <arturo> T188994 upgrade packages from jessie-backports in all jessie servers [tools]
12:49 <arturo> T188994 upgrade packages from trusty-updates in all ubuntu servers [tools]
12:34 <arturo> T188994 upgrade packages from trusty-wikimedia in all ubuntu servers [tools]
2018-03-08 §
16:05 <chasemp> tools-clushmaster-01:~$ clush -g all 'sudo puppet agent --test' [tools]
14:02 <arturo> T188994 upgrading trusty-tools packages in all the cluster, this includes jobutils, openssh-server and openssh-sftp-server [tools]
2018-03-07 §
20:42 <chicocvenancio> killed io intensive recursive zip of huge folder [tools]
18:30 <madhuvishy> Killed php-cgi job run by user 51242 on tools-webgrid-lighttpd-1413 [tools]
14:08 <arturo> just merged NFS package pinning https://gerrit.wikimedia.org/r/#/c/416943/ [tools]
13:47 <arturo> deploying more apt pinnings: https://gerrit.wikimedia.org/r/#/c/416934/ [tools]
2018-03-06 §
16:15 <madhuvishy> Reboot tools-docker-registry-02 T189018 [tools]
15:50 <madhuvishy> Rebooting tools-worker-1011 [tools]
15:08 <chasemp> tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs [tools]
15:03 <arturo> drain and reboot tools-worker-1011 [tools]
15:03 <chasemp> rebooted tools-worker 1001-1008 [tools]
14:58 <arturo> drain and reboot tools-worker-1010 [tools]
14:27 <chasemp> multiple tools running on k8s workers report issues reading replica.my.cnf file atm [tools]
14:27 <chasemp> reboot tools-worker-100[12] [tools]
14:23 <chasemp> downtime icinga alert for k8s workers ready [tools]
13:21 <arturo> T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced [tools]
12:58 <arturo> T188994 upgrading packages in jessie nodes from the oldstable source [tools]
11:42 <arturo> clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem [tools]
11:41 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide [tools]
11:36 <arturo> (ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) [tools]
11:33 <arturo> removing unused kernel packages in ubuntu nodes [tools]
11:08 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster [tools]
2018-03-05 §
18:56 <zhuyifei1999_> also published jobutils_1.30_all.deb [tools]