2019-02-16
§
|
05:00 |
<zhuyifei1999_> |
fixed by restarting flannel. another puppet run simply started kubelet |
[tools] |
04:58 |
<zhuyifei1999_> |
puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' |
[tools] |
04:52 |
<zhuyifei1999_> |
copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet |
[tools] |
04:48 |
<zhuyifei1999_> |
that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) |
[tools] |
04:44 |
<zhuyifei1999_> |
puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' |
[tools] |
04:43 |
<zhuyifei1999_> |
this one has logs full of 'Can't contact LDAP server' |
[tools] |
04:41 |
<zhuyifei1999_> |
nslcd also broken on tools-worker-1005 |
[tools] |
04:34 |
<zhuyifei1999_> |
uncordon tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:33 |
<zhuyifei1999_> |
the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT |
[tools] |
04:31 |
<zhuyifei1999_> |
then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs |
[tools] |
04:30 |
<zhuyifei1999_> |
`nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work |
[tools] |
04:23 |
<zhuyifei1999_> |
drained tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:16 |
<zhuyifei1999_> |
logs: https://phabricator.wikimedia.org/P8095 |
[tools] |
04:14 |
<zhuyifei1999_> |
restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs |
[tools] |
04:12 |
<zhuyifei1999_> |
restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP |
[tools] |
2019-02-14
§
|
21:57 |
<bd808> |
Deleted old tools-proxy-02 instance |
[tools] |
21:57 |
<bd808> |
Deleted old tools-proxy-01 instance |
[tools] |
21:56 |
<bd808> |
Deleted old tools-package-builder-01 instance |
[tools] |
20:57 |
<andrewbogott> |
rebooting tools-worker-1005 |
[tools] |
20:34 |
<andrewbogott> |
moving tools-exec-1409, tools-exec-1410, tools-exec-1414, tools-exec-1419 |
[tools] |
19:55 |
<andrewbogott> |
moving tools-webgrid-generic-1401 and tools-webgrid-lighttpd-1419 |
[tools] |
19:33 |
<andrewbogott> |
moving tools-checker-01 to labvirt1003 |
[tools] |
19:25 |
<andrewbogott> |
moving tools-elastic-02 to labvirt1003 |
[tools] |
19:11 |
<andrewbogott> |
moving tools-k8s-etcd-01 to labvirt1002 |
[tools] |
18:37 |
<andrewbogott> |
moving tools-exec-1418, tools-exec-1424 to labvirt1003 |
[tools] |
18:34 |
<andrewbogott> |
moving tools-webgrid-lighttpd-1404, tools-webgrid-lighttpd-1406, tools-webgrid-lighttpd-1410 to labvirt1002 |
[tools] |
17:35 |
<arturo> |
T215154 tools-sgebastion-07 now running systemd 239 and starts enforcing user limits |
[tools] |
15:33 |
<andrewbogott> |
moving tools-worker-1002, 1003, 1005, 1006, 1007, 1010, 1013, 1014 to different labvirts in order to move labvirt1012 to eqiad1-r |
[tools] |