__all__ SAL

1-50 of 10000 results (46ms)

2018-01-09 §
23:21	<yuvipanda>	paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading	[tools]
23:08	<yuvipanda>	turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1	[tools]
23:01	<yuvipanda>	kill paws master and reboot it	[tools]
22:57	<bd808>	Deployed be6109b (add s8 slice)	[tools.replag]
22:54	<yuvipanda>	kill all kube-system pods in paws cluster	[tools]
22:54	<yuvipanda>	kill all PAWS pods	[tools]
22:53	<yuvipanda>	redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason	[tools]
22:52	<godog>	ms-be1033 truncate unrotated and big server.log	[production]
22:49	<yuvipanda>	run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8	[tools]
22:48	<yuvipanda>	run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes	[tools]
22:46	<yuvipanda>	reboot all paws-worker nodes	[tools]
22:46	<yuvipanda>	run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster	[tools]
22:46	<madhuvishy>	run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster	[tools]
22:22	<aaron@tin>	Synchronized php-1.31.0-wmf.16/includes/Setup.php: 68b4bbfbc12c626 (duration: 01m 15s)	[production]
22:20	<mutante>	netmon2001 - arming keyholder for rancid	[production]
21:17	<chasemp>	...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test"	[tools]
21:17	<chasemp>	tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test"	[tools]
21:10	<mepps>	updated SmashPig from 45aa62650c to 778e8f87b4	[production]
21:10	<chasemp>	tools-k8s-master-01:~# for n in `kubectl get nodes \| awk '{print $1}' \| grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done	[tools]
20:57	<twentyafterfour@tin>	Finished scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) (duration: 36m 34s)	[production]
20:55	<chasemp>	for n in `kubectl get nodes \| awk '{print $1}' \| grep -v -e tools-worker-1001 -e tools-worker-1016`; do kubectl cordon $n; done	[tools]
20:51	<chasemp>	kubectl cordon tools-worker-1001.tools.eqiad.wmflabs	[tools]
20:34	<mutante>	wikibase-vue cant start Apache because docker-proxy is already using port 80	[wikidata-dev]
20:32	<mutante>	fixed puppet runs on wikibase-stretch, wikibase-vue, wikibase with https://gerrit.wikimedia.org/r/#/c/403232/	[wikidata-dev]
20:21	<twentyafterfour@tin>	Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2)	[production]
20:15	<chasemp>	disable puppet on proxies and k8s workers	[tools]
20:14	<twentyafterfour@tin>	scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="test2wiki" --outdir="/tmp/scap_l10n_3984299293" --threads=10 --lang en --quiet' returned non-zero exit status 1 (duration: 02m 44s)	[production]
20:13	<mutante>	netmon2001 - rebooting	[production]
20:12	<twentyafterfour@tin>	Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749	[production]
20:04	<mutante>	gerrit2001 - rebooting	[production]
20:00	<mutante>	phab2001 - reboot for upgrade	[production]
19:50	<chasemp>	clush -w @all 'sudo puppet agent --test'	[tools]
19:42	<chasemp>	reboot tools-worker-1010	[tools]
19:20	<mepps>	rolledback SmashPig from 0c45b1a684 to 45aa62650c	[production]
19:07	<mepps>	updated SmashPig from 45aa62650c to 0c45b1a684	[production]
18:42	<mutante>	ms-fe3002,ms-fe3001 - powering down, removing from puppet and icinga, ms-be* removing from puppet/icinga (T169518)	[production]
18:38	<mutante>	ms-fe3001 - shutting down for decom, removed from puppet	[production]
18:38	<mutante>	mw1227 still not showing recovery, using restart-hhvm	[production]
18:29	<mutante>	mw1227 killed it one more time and also restarted apache.. now load going down	[production]
18:26	<mutante>	mw1227 hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1024PST.log ; then killed hhvm and restarted it with systemctl	[production]
17:56	<twentyafterfour>	MediaWiki Train: Branching 1.31.0-wmf.16	[production]
17:41	<moritzm>	rebooting image scalers in codfw for kernel security update (along with HHVM update)	[production]
17:30	<volans>	re-enabled Icinga event handlers on RAID checks for lvs3001	[production]
17:17	<ema>	failover traffic back to lvs3001, raid rebuilt	[production]
17:15	<godog>	depool restbase cassandra 2 nodes - T184100	[production]
16:53	<joal>	Rerun pageview-druid-hourly-wf-2018-1-9-13	[analytics]
16:35	<cmjohnson1>	disabling pupppet for decom on mw1180-1200	[production]
16:28	<volans>	disabled Icinga event handlers on RAID checks for lvs3001, WIP on the host	[production]
16:18	<gehel>	starting cluster reboot for elasticsearch / cirrus codfw	[production]
16:09	<bd808>	data-services: added s8.{analytics,web}.db.svc.eqiad.wmflabs and aliases (T181643, T184179)	[production]