Spark Streaming 调优

本文档使用的样本数据量有效信令大约每分钟100-120w条，每分钟数据总大小约200MB flume flume channel capacity 默认1000，调整为1000000 = 1000```1- channel size不用太大，需要调大kafka sink的取消息速

Yarn Node Label 配置

进入ambari yarn 配置界面，在yarn features 栏中点击Node Labels按钮,使其变成enabled。进入命令行进行label与node的配置：在yarn中添加label，exclusive默认为true rmadmin -addToCluste

部署codis集群用到了runRemoteCmd脚本，runRemoteCmd 可以将命令分发到集群的某些机器上执行，本文档中的“codis”指需要部署codis的机器列表启动codis-dashboard--ncpu=4 --config=dashboard.toml --lo

从github上获取 databricks/spark-sql-perf 与 davies/tpcds-kit davies/tpcds-kit git clone https://github.com/davies/tpcds-kit.git databricks/s

本文以hadoop-yarn为例，介绍从HortonWorks的hadoop源代码编译到重新打包成RPM的过程。 Hortonworks源代码下载使用git下载源代码： Hortonworks hadoop官方github地址 https://github.com/horton