2019-02-16
§
|
13:47 |
<arturo> |
T193264 create 'clouddb-services-puppetmaster' puppet prefix to store puppet/hiera config for this project puppetmaster |
[clouddb-services] |
13:43 |
<arturo> |
T193264 create 'clouddb-services-puppetmaster-01' instance |
[clouddb-services] |
13:33 |
<arturo> |
add myself as user and projectadmin |
[clouddb-services] |
05:00 |
<zhuyifei1999_> |
fixed by restarting flannel. another puppet run simply started kubelet |
[tools] |
04:58 |
<zhuyifei1999_> |
puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' |
[tools] |
04:52 |
<zhuyifei1999_> |
copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet |
[tools] |
04:48 |
<zhuyifei1999_> |
that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) |
[tools] |
04:44 |
<zhuyifei1999_> |
puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' |
[tools] |
04:43 |
<zhuyifei1999_> |
this one has logs full of 'Can't contact LDAP server' |
[tools] |
04:41 |
<zhuyifei1999_> |
nslcd also broken on tools-worker-1005 |
[tools] |
04:34 |
<zhuyifei1999_> |
uncordon tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:33 |
<zhuyifei1999_> |
the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT |
[tools] |
04:31 |
<zhuyifei1999_> |
then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs |
[tools] |
04:30 |
<zhuyifei1999_> |
`nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work |
[tools] |
04:23 |
<zhuyifei1999_> |
drained tools-worker-1014.tools.eqiad.wmflabs |
[tools] |
04:16 |
<zhuyifei1999_> |
logs: https://phabricator.wikimedia.org/P8095 |
[tools] |
04:14 |
<zhuyifei1999_> |
restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs |
[tools] |
04:12 |
<zhuyifei1999_> |
restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP |
[tools] |
00:20 |
<XioNoX> |
add port 22 in cloud-in4 term labsdb |
[production] |
2019-02-15
§
|
23:42 |
<bd808> |
Added BryanDavis (self), Arturo Borrero Gonzalez, Marostegui, and Jcrespo as admins in project |
[clouddb-services] |
22:49 |
<bstorm_> |
created mariadb security group and lvs for a new database T193264 |
[clouddb-services] |
22:49 |
<Joan> |
Restarted CVNBot3 (Last message was received on RCReader 5729.637672 seconds ago) |
[cvn] |
20:40 |
<andrewbogott> |
enabled virtualization (all three settings) on cloudvirt1019 |
[production] |
19:41 |
<arturo> |
T193264 reimaging cloudvirt1019 to get mitaka/stretch |
[production] |
18:51 |
<arturo> |
T193264 icinga downtime cloudvirt1019 for 1 week |
[production] |
18:44 |
<bstorm_> |
stopped replication and then mariadb on labsdb1004 |
[production] |
18:18 |
<nuria> |
restarted turnilo in analytics-tool1002 |
[analytics] |
17:28 |
<thcipriani> |
integration-slave-jessie-1002:/srv/jenkins-workspace/workspace$ `sudo rm -rf *` due to full disk |
[releng] |
16:52 |
<cdanis> |
correction, needed to increment version; adding backported rasdaemon 0.6.0-1.2+deb8u2 to jessie-wikimedia |
[production] |
16:48 |
<cdanis> |
adding backported rasdaemon 0.6.0-1.2+deb8u1 to jessie-wikimedia |
[production] |
16:29 |
<bblack> |
reprepro: uploaded gdnsd-3.0.0-1~wmf1 to stretch-wikimedia |
[production] |
16:28 |
<Lucas_WMDE> |
moved cronjob from trusty to stretch (following [[wikitech:News/Toolforge Trusty Move a cron job]]) |
[tools.wmde-access] |
15:45 |
<moritzm> |
rebooting auth1001 for kernel security update |
[production] |
14:50 |
<moritzm> |
installing unbound update from stretch point release |
[production] |
14:45 |
<moritzm> |
removed labvirt1012 from debmonitor (got renamed to cloudvirt1012) (T216190) |
[production] |
14:06 |
<moritzm> |
rebooting mwlog1001 for kernel security update |
[production] |
13:54 |
<moritzm> |
rebooting mwlog2001 for kernel security update |
[production] |
13:46 |
<jbond42> |
install tar security updates |
[production] |
13:19 |
<moritzm> |
rolling reboot of mwdebug servers in eqiad to pick up SSBD-enabled qemu |
[production] |
13:15 |
<Amir1> |
migrating the webservice to stretch+k8s |
[tools.mrmetadata] |
13:12 |
<gtirloni> |
reboot cloudvirt1020 |
[production] |
13:11 |
<arturo> |
T216239 labvirt1019 has been drained of any workload |
[production] |
13:10 |
<arturo> |
T216239 labvirt1019 has been drained |
[admin] |
13:06 |
<moritzm> |
installing NSS security updates |
[production] |
12:42 |
<moritzm> |
installing squid3 security updates |
[production] |
12:30 |
<jynus> |
stop db2089 mysql instances for reboot testing T216240 |
[production] |
12:30 |
<arturo> |
T216239 schedule 1week of icinga downtime for labvirt1019 |
[production] |
12:22 |
<arturo> |
T216239 draining labvirt1009 with a command like this: `root@cloudcontrol1004:~# wmcs-cold-migrate --region eqiad --nova-db nova 2c0cf363-c7c3-42ad-94bd-e586f2492321 labvirt1001` |
[admin] |
12:02 |
<arturo> |
more nova service cleanups in the database (labvirts that were reallocated to eqiad1) |
[admin] |
11:34 |
<arturo> |
T216190 cleanup from nova database `nova service-delete 35` |
[admin] |