2019-02-25 §
07:37 <zhuyifei1999_> tools-worker-1015.tools.eqiad.wmflabs having severe NFS issues (all NFS accessing processes are stuck in D state). Draining. [tools]
2019-02-22 §
16:29 <gtirloni> upgraded and rebooted tools-puppetmaster-01 (new kernel) [tools]
15:59 <gtirloni> started tools-puppetmaster-01 (new size: m1.large) [tools]
15:13 <gtirloni> shutdown tools-puppetmaster-01 [tools]
2019-02-21 §
09:59 <gtirloni> upgraded all packages in all stretch nodes [tools]
00:12 <zhuyifei1999_> forcing puppet run on tools-k8s-master-01 [tools]
00:08 <zhuyifei1999_> running /usr/local/bin/git-sync-upstream on tools-puppetmaster-01 to speed puppet changes up [tools]
2019-02-20 §
23:30 <zhuyifei1999_> begin rebuilding all docker images T178601 T193646 T215683 [tools]
23:25 <zhuyifei1999_> upgraded toollabs-webservice on tools-bastion-02 to 0.44 (newly-built version) [tools]
23:19 <zhuyifei1999_> this was built for stretch. hopefully it works for all distros [tools]
23:17 <zhuyifei1999_> begin build new tools-webservice package T178601 T193646 T215683 [tools]
21:57 <andrewbogott> moving tools-static-13 to a new virt host [tools]
21:34 <andrewbogott> moving the tools-static IP from tools-static-13 to tools-static-12 [tools]
19:17 <andrewbogott> moving tools-bastion-02 to labvirt1004 [tools]
16:56 <andrewbogott> moving tools-paws-worker-1003 [tools]
15:53 <andrewbogott> moving tools-worker-1017, tools-worker-1027, tools-worker-1028 [tools]
15:03 <andrewbogott> moving tools-exec-1413 and tools-exec-1442 [tools]
2019-02-19 §
01:49 <bd808> Revoked Toolforge project membership for user DannyS712 (T215092) [tools]
2019-02-18 §
20:45 <gtirloni> upgraded and rebooted tools-sgebastion-07 (login-stretch) [tools]
20:22 <gtirloni> enabled toolsdb monitoring in Icinga [tools]
20:03 <gtirloni> pointed tools-db.eqiad.wmflabs to [tools]
18:50 <chicocvenancio> moving paws back to toolsdb T216208 [tools]
13:47 <arturo> rebooting tools-sgebastion-07 to try fixing general slowness [tools]
2019-02-17 §
22:23 <zhuyifei1999_> uncordon tools-worker-1010.tools.eqiad.wmflabs [tools]
22:11 <zhuyifei1999_> rebooting tools-worker-1010.tools.eqiad.wmflabs [tools]
22:10 <zhuyifei1999_> draining tools-worker-1010.tools.eqiad.wmflabs, `docker ps` is hanging. no idea why. also other weirdness like ContainerCreating forever [tools]
2019-02-16 §
05:00 <zhuyifei1999_> fixed by restarting flannel. another puppet run simply started kubelet [tools]
04:58 <zhuyifei1999_> puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' [tools]
04:52 <zhuyifei1999_> copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet [tools]
04:48 <zhuyifei1999_> that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) [tools]
04:44 <zhuyifei1999_> puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' [tools]
04:43 <zhuyifei1999_> this one has logs full of 'Can't contact LDAP server' [tools]
04:41 <zhuyifei1999_> nslcd also broken on tools-worker-1005 [tools]
04:34 <zhuyifei1999_> uncordon tools-worker-1014.tools.eqiad.wmflabs [tools]
04:33 <zhuyifei1999_> the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT [tools]
04:31 <zhuyifei1999_> then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs [tools]
04:30 <zhuyifei1999_> `nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work [tools]
04:23 <zhuyifei1999_> drained tools-worker-1014.tools.eqiad.wmflabs [tools]
04:16 <zhuyifei1999_> logs: https://phabricator.wikimedia.org/P8095 [tools]
04:14 <zhuyifei1999_> restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs [tools]
04:12 <zhuyifei1999_> restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP [tools]
2019-02-14 §
21:57 <bd808> Deleted old tools-proxy-02 instance [tools]
21:57 <bd808> Deleted old tools-proxy-01 instance [tools]
21:56 <bd808> Deleted old tools-package-builder-01 instance [tools]
20:57 <andrewbogott> rebooting tools-worker-1005 [tools]
20:34 <andrewbogott> moving tools-exec-1409, tools-exec-1410, tools-exec-1414, tools-exec-1419 [tools]
19:55 <andrewbogott> moving tools-webgrid-generic-1401 and tools-webgrid-lighttpd-1419 [tools]
19:33 <andrewbogott> moving tools-checker-01 to labvirt1003 [tools]
19:25 <andrewbogott> moving tools-elastic-02 to labvirt1003 [tools]
19:11 <andrewbogott> moving tools-k8s-etcd-01 to labvirt1002 [tools]