601-650 of 10000 results (44ms)
2018-01-10 §
08:13 <marostegui> Deploy schema change on s5 dbstore1002 - T174569 [production]
07:50 <legoktm> deployed https://gerrit.wikimedia.org/r/402826 [releng]
07:44 <moritzm> rebooting mw1262-mw1275 for kernel security update (along with update to HHVM 3.18.6) [production]
07:37 <marostegui> Drop external_user from wikidatawiki - T184247 [production]
06:17 <marostegui> Deploy schema change on s5 codfw master (db2052) with replication (this will generate lag on codfw) - T174569 [production]
02:24 <l10nupdate@tin> scap sync-l10n completed (1.31.0-wmf.15) (duration: 06m 02s) [production]
01:39 <mutante> mw1226 - high load - hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1739PST.log ; restart-hhvm [production]
00:43 <mutante> rebooting gerrit server for kernel upgrade [production]
00:18 <mutante> rebooting phabricator server for kernel upgrade [production]
00:15 <mutante> moving renamed Hiera values to Prefix puppet for planet-* after https://gerrit.wikimedia.org/r/#/c/397729 - fixing puppet run on planet-hotdog [planet]
2018-01-09 §
23:21 <yuvipanda> paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading [tools]
23:08 <yuvipanda> turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 [tools]
23:01 <yuvipanda> kill paws master and reboot it [tools]
22:57 <bd808> Deployed be6109b (add s8 slice) [tools.replag]
22:54 <yuvipanda> kill all kube-system pods in paws cluster [tools]
22:54 <yuvipanda> kill all PAWS pods [tools]
22:53 <yuvipanda> redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason [tools]
22:52 <godog> ms-be1033 truncate unrotated and big server.log [production]
22:49 <yuvipanda> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 [tools]
22:48 <yuvipanda> run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes [tools]
22:46 <yuvipanda> reboot all paws-worker nodes [tools]
22:46 <yuvipanda> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [tools]
22:46 <madhuvishy> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [tools]
22:22 <aaron@tin> Synchronized php-1.31.0-wmf.16/includes/Setup.php: 68b4bbfbc12c626 (duration: 01m 15s) [production]
22:20 <mutante> netmon2001 - arming keyholder for rancid [production]
21:17 <chasemp> ...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" [tools]
21:17 <chasemp> tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" [tools]
21:10 <mepps> updated SmashPig from 45aa62650c to 778e8f87b4 [production]
21:10 <chasemp> tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done [tools]
20:57 <twentyafterfour@tin> Finished scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) (duration: 36m 34s) [production]
20:55 <chasemp> for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016`; do kubectl cordon $n; done [tools]
20:51 <chasemp> kubectl cordon tools-worker-1001.tools.eqiad.wmflabs [tools]
20:34 <mutante> wikibase-vue cant start Apache because docker-proxy is already using port 80 [wikidata-dev]
20:32 <mutante> fixed puppet runs on wikibase-stretch, wikibase-vue, wikibase with https://gerrit.wikimedia.org/r/#/c/403232/ [wikidata-dev]
20:21 <twentyafterfour@tin> Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) [production]
20:15 <chasemp> disable puppet on proxies and k8s workers [tools]
20:14 <twentyafterfour@tin> scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="test2wiki" --outdir="/tmp/scap_l10n_3984299293" --threads=10 --lang en --quiet' returned non-zero exit status 1 (duration: 02m 44s) [production]
20:13 <mutante> netmon2001 - rebooting [production]
20:12 <twentyafterfour@tin> Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 [production]
20:04 <mutante> gerrit2001 - rebooting [production]
20:00 <mutante> phab2001 - reboot for upgrade [production]
19:50 <chasemp> clush -w @all 'sudo puppet agent --test' [tools]
19:42 <chasemp> reboot tools-worker-1010 [tools]
19:20 <mepps> rolledback SmashPig from 0c45b1a684 to 45aa62650c [production]
19:07 <mepps> updated SmashPig from 45aa62650c to 0c45b1a684 [production]
18:42 <mutante> ms-fe3002,ms-fe3001 - powering down, removing from puppet and icinga, ms-be* removing from puppet/icinga (T169518) [production]
18:38 <mutante> ms-fe3001 - shutting down for decom, removed from puppet [production]
18:38 <mutante> mw1227 still not showing recovery, using restart-hhvm [production]
18:29 <mutante> mw1227 killed it one more time and also restarted apache.. now load going down [production]
18:26 <mutante> mw1227 hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1024PST.log ; then killed hhvm and restarted it with systemctl [production]