2018-01-10
§
|
11:07 |
<volans> |
start failovering of Icinga to tegmen - T170353 |
[production] |
11:03 |
<elukey> |
reboot analytics1040->43 for kernel updates |
[analytics] |
10:55 |
<elukey> |
reboot analytics1040->43 for kernel updates |
[production] |
10:29 |
<godog> |
reimage restbase1011 to test HBA mode - T184100 |
[production] |
10:16 |
<moritzm> |
rebooting bast4001 for kernel security update |
[production] |
10:06 |
<elukey> |
rebooting analytics1035 (hadoop worker node and hdfs journal node) for kernel updates |
[production] |
10:02 |
<moritzm> |
rebooting tegmen for kernel security update |
[production] |
09:50 |
<godog> |
shut cassandra 2 on restbase legacy nodes - T184100 |
[production] |
09:40 |
<hashar> |
update docker-pkg images for releng/rake https://gerrit.wikimedia.org/r/#/c/403311/ |
[releng] |
09:40 |
<moritzm> |
rebooting kubernetes workers (plus staging hosts) for kernel security update |
[production] |
09:39 |
<ema> |
eqiad LVSs: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267 |
[production] |
09:32 |
<marostegui> |
Upgrade kernel on db1067 |
[production] |
09:27 |
<godog> |
stop restbase on cassandra 2 nodes - T184100 |
[production] |
09:15 |
<marostegui> |
Deploy schema change on db1051 - T174569 |
[production] |
09:12 |
<moritzm> |
rebooting radium (tor relay) for kernel security update |
[production] |
08:42 |
<marostegui> |
Stop replication in sync on db1089 and db1067 - T162807 |
[production] |
08:41 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1067 and db1089 - T162807 (duration: 01m 05s) |
[production] |
08:38 |
<marostegui> |
Deploy schema change on s5 dbstore1001 - T174569 |
[production] |
08:33 |
<moritzm> |
rebooting mw1299-mw1306 (job runners) for kernel security update (along with update to HHVM 3.18.6) |
[production] |
08:28 |
<hashar> |
contint1001: upgraded Zuul 2.5.0-8-gcbc7f62-wmf4jessie1 .. 2.5.0-8-gcbc7f62-wmf6 | T158243 |
[production] |
08:13 |
<marostegui> |
Deploy schema change on s5 dbstore1002 - T174569 |
[production] |
07:50 |
<legoktm> |
deployed https://gerrit.wikimedia.org/r/402826 |
[releng] |
07:44 |
<moritzm> |
rebooting mw1262-mw1275 for kernel security update (along with update to HHVM 3.18.6) |
[production] |
07:37 |
<marostegui> |
Drop external_user from wikidatawiki - T184247 |
[production] |
06:17 |
<marostegui> |
Deploy schema change on s5 codfw master (db2052) with replication (this will generate lag on codfw) - T174569 |
[production] |
02:24 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.15) (duration: 06m 02s) |
[production] |
01:39 |
<mutante> |
mw1226 - high load - hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1739PST.log ; restart-hhvm |
[production] |
00:43 |
<mutante> |
rebooting gerrit server for kernel upgrade |
[production] |
00:18 |
<mutante> |
rebooting phabricator server for kernel upgrade |
[production] |
00:15 |
<mutante> |
moving renamed Hiera values to Prefix puppet for planet-* after https://gerrit.wikimedia.org/r/#/c/397729 - fixing puppet run on planet-hotdog |
[planet] |
2018-01-09
§
|
23:21 |
<yuvipanda> |
paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading |
[tools] |
23:08 |
<yuvipanda> |
turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 |
[tools] |
23:01 |
<yuvipanda> |
kill paws master and reboot it |
[tools] |
22:57 |
<bd808> |
Deployed be6109b (add s8 slice) |
[tools.replag] |
22:54 |
<yuvipanda> |
kill all kube-system pods in paws cluster |
[tools] |
22:54 |
<yuvipanda> |
kill all PAWS pods |
[tools] |
22:53 |
<yuvipanda> |
redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason |
[tools] |
22:52 |
<godog> |
ms-be1033 truncate unrotated and big server.log |
[production] |
22:49 |
<yuvipanda> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 |
[tools] |
22:48 |
<yuvipanda> |
run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes |
[tools] |
22:46 |
<yuvipanda> |
reboot all paws-worker nodes |
[tools] |
22:46 |
<yuvipanda> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster |
[tools] |
22:46 |
<madhuvishy> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster |
[tools] |
22:22 |
<aaron@tin> |
Synchronized php-1.31.0-wmf.16/includes/Setup.php: 68b4bbfbc12c626 (duration: 01m 15s) |
[production] |
22:20 |
<mutante> |
netmon2001 - arming keyholder for rancid |
[production] |
21:17 |
<chasemp> |
...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" |
[tools] |
21:17 |
<chasemp> |
tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" |
[tools] |
21:10 |
<mepps> |
updated SmashPig from 45aa62650c to 778e8f87b4 |
[production] |
21:10 |
<chasemp> |
tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done |
[tools] |
20:57 |
<twentyafterfour@tin> |
Finished scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) (duration: 36m 34s) |
[production] |