2019-02-20
§
|
23:30 |
<zhuyifei1999_> |
begin rebuilding all docker images T178601 T193646 T215683 |
[tools] |
23:25 |
<zhuyifei1999_> |
upgraded toollabs-webservice on tools-bastion-02 to 0.44 (newly-built version) |
[tools] |
23:19 |
<zhuyifei1999_> |
this was built for stretch. hopefully it works for all distros |
[tools] |
23:17 |
<zhuyifei1999_> |
begin build new tools-webservice package T178601 T193646 T215683 |
[tools] |
21:57 |
<andrewbogott> |
moving tools-static-13 to a new virt host |
[tools] |
21:34 |
<andrewbogott> |
moving the tools-static IP from tools-static-13 to tools-static-12 |
[tools] |
19:17 |
<andrewbogott> |
moving tools-bastion-02 to labvirt1004 |
[tools] |
16:56 |
<andrewbogott> |
moving tools-paws-worker-1003 |
[tools] |
15:53 |
<andrewbogott> |
moving tools-worker-1017, tools-worker-1027, tools-worker-1028 |
[tools] |
15:03 |
<andrewbogott> |
moving tools-exec-1413 and tools-exec-1442 |
[tools] |
2019-02-16
§
|
05:00 |
<zhuyifei1999_> |
fixed by restarting flannel. another puppet run simply started kubelet |
[tools] |
04:58 |
<zhuyifei1999_> |
puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' |
[tools] |
04:52 |
<zhuyifei1999_> |
copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet |
[tools] |
04:48 |
<zhuyifei1999_> |
that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) |
[tools] |
04:44 |
<zhuyifei1999_> |
puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' |
[tools] |
04:43 |
<zhuyifei1999_> |
this one has logs full of 'Can't contact LDAP server' |
[tools] |
04:41 |
<zhuyifei1999_> |
nslcd also broken on tools-worker-1005 |
[tools] |
04:34 |
<zhuyifei1999_> |
uncordon tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:33 |
<zhuyifei1999_> |
the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT |
[tools] |
04:31 |
<zhuyifei1999_> |
then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs |
[tools] |
04:30 |
<zhuyifei1999_> |
`nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work |
[tools] |
04:23 |
<zhuyifei1999_> |
drained tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:16 |
<zhuyifei1999_> |
logs: https://phabricator.wikimedia.org/P8095 |
[tools] |
04:14 |
<zhuyifei1999_> |
restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs |
[tools] |
04:12 |
<zhuyifei1999_> |
restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP |
[tools] |