2019-02-17
§
|
22:33 |
<bd808> |
Migrated from Trusty -> Stretch -> Kubernetes |
[tools.mysql-php-session-test] |
22:23 |
<zhuyifei1999_> |
uncordon tools-worker-1010.tools.eqiad.wmflabs |
[tools] |
22:13 |
<bd808> |
Migrated from Trusty -> Stretch -> Kubernetes |
[tools.my-first-flask-tool] |
22:11 |
<zhuyifei1999_> |
rebooting tools-worker-1010.tools.eqiad.wmflabs |
[tools] |
22:10 |
<zhuyifei1999_> |
draining tools-worker-1010.tools.eqiad.wmflabs, `docker ps` is hanging. no idea why. also other weirdness like ContainerCreating forever |
[tools] |
21:43 |
<bd808> |
Force deleted pod stuck in Terminating state with ` kubectl delete po/trusty-tools-909545302-jwrz7 --now` |
[tools.trusty-tools] |
21:21 |
<bstorm_> |
The slave of labsdb1005.eqiad.wmnet is now clouddb1001.clouddb-services.eqiad.wmflabs |
[clouddb-services] |
21:20 |
<bstorm_> |
The slave of labsdb1005.eqiad.wmnet is now clouddb1001.clouddb-services.eqiad.wmflabs |
[production] |
19:16 |
<arturo> |
T193264 delete VM clouddb-services-01 |
[clouddb-services] |
18:54 |
<arturo> |
T193264 create VM clouddb-services-01 for PoC of running maintain-dbusers from here |
[clouddb-services] |
18:34 |
<zhuyifei1999_> |
restarted webservice. it still has a phantom pod trusty-tools-909545302-jwrz7 at tools-worker-1010.tools.eqiad.wmflabs which refuses to terminate |
[tools.trusty-tools] |
13:14 |
<XioNoX> |
add term labsdb_return to cloud-in4 - T216353 |
[production] |
07:41 |
<wikibugs> |
Updated channels.yaml to: 62469f2db86d26c599400a55b9a7642ef95ce8d9 Update for Acme-chief project rename |
[tools.wikibugs] |
07:21 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/491029 |
[releng] |
07:10 |
<legoktm> |
Building image docker-registry.discovery.wmnet/releng/tox-acme-chief:0.3.4 |
[releng] |
06:28 |
<legoktm> |
building new tox-acme-chief docker image https://gerrit.wikimedia.org/r/489725 |
[releng] |
01:12 |
<Krinkle> |
beta-scap-eqiad (cron) failing with "sudo: a password is required" |
[releng] |
2019-02-16
§
|
19:44 |
<Krinkle> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/490937 / T216275) |
[releng] |
17:23 |
<thcipriani> |
installed php7.0-curl on deployment-deploy01 (why was that suddenly necessary?) |
[releng] |
16:26 |
<ariel@deploy1001> |
Finished deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures (duration: 00m 03s) |
[production] |
16:25 |
<ariel@deploy1001> |
Started deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures |
[production] |
14:21 |
<arturo> |
T194855 cloudvirt1020 is poweroff, waiting for disk setup before installing |
[production] |
13:59 |
<arturo> |
T193264 switched clouddb1001/1004 to the new project local puppetmaster |
[clouddb-services] |
13:54 |
<arturo> |
T193264 create 'clouddb10' puppet prefix to store puppet/hiera config for database servers in this project |
[clouddb-services] |
13:47 |
<arturo> |
T193264 create 'clouddb-services-puppetmaster' puppet prefix to store puppet/hiera config for this project puppetmaster |
[clouddb-services] |
13:43 |
<arturo> |
T193264 create 'clouddb-services-puppetmaster-01' instance |
[clouddb-services] |
13:33 |
<arturo> |
add myself as user and projectadmin |
[clouddb-services] |
05:00 |
<zhuyifei1999_> |
fixed by restarting flannel. another puppet run simply started kubelet |
[tools] |
04:58 |
<zhuyifei1999_> |
puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' |
[tools] |
04:52 |
<zhuyifei1999_> |
copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet |
[tools] |
04:48 |
<zhuyifei1999_> |
that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) |
[tools] |
04:44 |
<zhuyifei1999_> |
puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' |
[tools] |
04:43 |
<zhuyifei1999_> |
this one has logs full of 'Can't contact LDAP server' |
[tools] |
04:41 |
<zhuyifei1999_> |
nslcd also broken on tools-worker-1005 |
[tools] |
04:34 |
<zhuyifei1999_> |
uncordon tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:33 |
<zhuyifei1999_> |
the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT |
[tools] |
04:31 |
<zhuyifei1999_> |
then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs |
[tools] |
04:30 |
<zhuyifei1999_> |
`nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work |
[tools] |
04:23 |
<zhuyifei1999_> |
drained tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:16 |
<zhuyifei1999_> |
logs: https://phabricator.wikimedia.org/P8095 |
[tools] |
04:14 |
<zhuyifei1999_> |
restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs |
[tools] |
04:12 |
<zhuyifei1999_> |
restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP |
[tools] |
00:20 |
<XioNoX> |
add port 22 in cloud-in4 term labsdb |
[production] |