Contents
  1. 1. Hortonworks源代码下载
  2. 2. 准备编译环境
  3. 3. 源代码编译
  4. 4. 安装Hortonworks官方RPM
  5. 5. 更新官方rpm安装后的文件
  6. 6. 使用rpmrebuild将更新后的文件重新打包成rpm

本文以hadoop-yarn为例,介绍从HortonWorks的hadoop源代码编译重新打包成RPM的过程。

Hortonworks源代码下载

  • 使用git下载源代码:

准备编译环境

  • 安装building.txt列出的所有依赖包,例如编译hadoop依赖的包如下:

Requirements:

  • Unix System
  • JDK 1.7+
  • Maven 3.0 or later
  • Findbugs 1.3.9 (if running findbugs)
  • ProtocolBuffer 2.5.0
  • CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
  • Zlib devel (if compiling native code)
  • openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
  • Jansson C XML parsing library ( if compiling libwebhdfs )
  • Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
  • Internet connection for first build (to fetch all Maven and Hadoop dependencies)

源代码编译

  • 一般使用maven进行编译,最后压缩为tar.gz包

    • 使用maven进行编译与打包,主要使用maven package进行打包

      Maven build goals:

      • Clean : mvn clean
      • Compile : mvn compile [-Pnative]
      • Run tests : mvn test [-Pnative]
      • Create JAR : mvn package
      • Run findbugs : mvn compile findbugs:findbugs
      • Run checkstyle : mvn compile checkstyle:checkstyle
      • Install JAR in M2 cache : mvn install
      • Deploy JAR to Maven repo : mvn deploy
      • Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
      • Run Rat : mvn apache-rat:check
      • Build javadocs : mvn javadoc:javadoc
      • Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
      • Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
    • maven编译参数说明:

      Build options:
      
      • Use -Pnative to compile/bundle native code
      • Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
      • Use -Psrc to create a project source TAR.GZ
      • Use -Dtar to create a TAR with the distribution (using -Pdist)
    • 在hadoop—yarn工程中,container-executor需要进行本地编译,因此需要在打包时指定native profile。对于hadoop-yarn, 完整的打包命令为mvn package -Pdist,native -DskipTests -Dtar

安装Hortonworks官方RPM

  • 使用Ambari或yum安装希望重新打包的服务(本例中的hadoop-yarn),或手动安装rpm
    • 当依赖关系比较复杂时,我们可以使用rpm -ivh *.rpm --nodeps忽略依赖关系单独安装rpm包,用来查看rpm的安装目录结构
  • 安装完成后进入安装目录,分析该目录与上一步中编译出的项目文件目录的区别,编写脚本,将我们自己编译源代码生成的文件替换到相应的文件位置。
    • 例如在hadoop-yarn项目中,我们自己编译好的项目目录结构如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
├── bin
│   ├── container-executor
│   ├── test-container-executor
│   ├── yarn
│   └── yarn.cmd
├── etc
│   └── hadoop
│   ├── capacity-scheduler.xml
│   ├── container-executor.cfg
│   ├── slaves
│   ├── yarn-env.sh
│   └── yarn-site.xml
├── libexec
│   ├── yarn-config.cmd
│   └── yarn-config.sh
├── sbin
│   ├── start-yarn.cmd
│   ├── start-yarn.sh
│   ├── stop-yarn.cmd
│   ├── stop-yarn.sh
│   ├── yarn-daemon.sh
│   └── yarn-daemons.sh
└── share
├── doc
│   └── hadoop
│   └── yarn
│   └── CHANGES.txt
└── hadoop
└── yarn
├── hadoop-yarn-api-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-distributedshell-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-client-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-registry-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-applicationhistoryservice-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-nodemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-resourcemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-sharedcachemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-tests-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-timeline-plugins-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-web-proxy-2.7.1.2.4.0.0-169.jar
├── lib
│   ├── activation-1.1.jar
│   ├── aopalliance-1.0.jar
│   ├── apacheds-i18n-2.0.0-M15.jar
│   ├── apacheds-kerberos-codec-2.0.0-M15.jar
│   ├── api-asn1-api-1.0.0-M20.jar
│   ├── api-util-1.0.0-M20.jar
│   ├── asm-3.2.jar
│   ├── avro-1.7.4.jar
│   ├── commons-beanutils-1.7.0.jar
│   ├── commons-beanutils-core-1.8.0.jar
│   ├── commons-cli-1.2.jar
│   ├── commons-codec-1.4.jar
│   ├── commons-collections-3.2.2.jar
│   ├── commons-compress-1.4.1.jar
│   ├── commons-configuration-1.6.jar
│   ├── commons-digester-1.8.jar
│   ├── commons-httpclient-3.1.jar
│   ├── commons-io-2.4.jar
│   ├── commons-lang-2.6.jar
│   ├── commons-logging-1.1.3.jar
│   ├── commons-math3-3.1.1.jar
│   ├── commons-net-3.1.jar
│   ├── curator-client-2.7.1.jar
│   ├── curator-framework-2.7.1.jar
│   ├── curator-recipes-2.7.1.jar
│   ├── fst-2.24.jar
│   ├── gson-2.2.4.jar
│   ├── guava-11.0.2.jar
│   ├── guice-3.0.jar
│   ├── guice-servlet-3.0.jar
│   ├── htrace-core-3.1.0-incubating.jar
│   ├── httpclient-4.2.5.jar
│   ├── httpcore-4.2.5.jar
│   ├── jackson-annotations-2.2.3.jar
│   ├── jackson-core-2.2.3.jar
│   ├── jackson-core-asl-1.9.13.jar
│   ├── jackson-databind-2.2.3.jar
│   ├── jackson-jaxrs-1.9.13.jar
│   ├── jackson-mapper-asl-1.9.13.jar
│   ├── jackson-xc-1.9.13.jar
│   ├── javassist-3.18.1-GA.jar
│   ├── javax.inject-1.jar
│   ├── java-xmlbuilder-0.4.jar
│   ├── jaxb-api-2.2.2.jar
│   ├── jaxb-impl-2.2.3-1.jar
│   ├── jersey-client-1.9.jar
│   ├── jersey-core-1.9.jar
│   ├── jersey-guice-1.9.jar
│   ├── jersey-json-1.9.jar
│   ├── jersey-server-1.9.jar
│   ├── jets3t-0.9.0.jar
│   ├── jettison-1.1.jar
│   ├── jetty-6.1.26.jar
│   ├── jetty-util-6.1.26.jar
│   ├── jsch-0.1.42.jar
│   ├── jsp-api-2.1.jar
│   ├── jsr305-3.0.0.jar
│   ├── leveldbjni-all-1.8.jar
│   ├── log4j-1.2.17.jar
│   ├── microsoft-windowsazure-storage-sdk-0.6.0.jar
│   ├── netty-3.6.2.Final.jar
│   ├── objenesis-2.1.jar
│   ├── paranamer-2.3.jar
│   ├── protobuf-java-2.5.0.jar
│   ├── servlet-api-2.5.jar
│   ├── snappy-java-1.0.4.1.jar
│   ├── stax-api-1.0-2.jar
│   ├── xmlenc-0.52.jar
│   ├── xz-1.0.jar
│   ├── zookeeper-3.4.6.2.4.0.0-169.jar
│   └── zookeeper-3.4.6.2.4.0.0-169-tests.jar
├── sources
│   ├── hadoop-yarn-api-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-applications-distributedshell-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-applications-distributedshell-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-client-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-client-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-common-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-common-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-applicationhistoryservice-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-server-applicationhistoryservice-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-common-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-server-common-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-nodemanager-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-server-nodemanager-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-resourcemanager-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-server-resourcemanager-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-tests-2.7.1.2.4.0.0-169-sources.jar
│   ├── hadoop-yarn-server-tests-2.7.1.2.4.0.0-169-test-sources.jar
│   ├── hadoop-yarn-server-web-proxy-2.7.1.2.4.0.0-169-sources.jar
│   └── hadoop-yarn-server-web-proxy-2.7.1.2.4.0.0-169-test-sources.jar
└── test
└── hadoop-yarn-server-tests-2.7.1.2.4.0.0-169-tests.jar
  • Hortonworks官方rpm安装目录结构如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
├── bin
│   ├── container-executor
│   ├── mapred
│   ├── mapred.distro
│   ├── yarn
│   └── yarn.distro
├── etc
│   ├── hadoop -> ../../hadoop/conf
│   └── rc.d
│   └── init.d
├── hadoop-yarn-api-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-api.jar -> hadoop-yarn-api-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-distributedshell-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-distributedshell.jar -> hadoop-yarn-applications-distributedshell-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-applications-unmanaged-am-launcher.jar -> hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-client-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-client.jar -> hadoop-yarn-client-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-common.jar -> hadoop-yarn-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-registry-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-registry.jar -> hadoop-yarn-registry-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-applicationhistoryservice-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-applicationhistoryservice.jar -> hadoop-yarn-server-applicationhistoryservice-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-common.jar -> hadoop-yarn-server-common-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-nodemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-nodemanager.jar -> hadoop-yarn-server-nodemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-resourcemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-resourcemanager.jar -> hadoop-yarn-server-resourcemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-sharedcachemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-sharedcachemanager.jar -> hadoop-yarn-server-sharedcachemanager-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-tests-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-tests.jar -> hadoop-yarn-server-tests-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-timeline-plugins-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-timeline-plugins.jar -> hadoop-yarn-server-timeline-plugins-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-web-proxy-2.7.1.2.4.0.0-169.jar
├── hadoop-yarn-server-web-proxy.jar -> hadoop-yarn-server-web-proxy-2.7.1.2.4.0.0-169.jar
├── lib
│   ├── activation-1.1.jar
│   ├── aopalliance-1.0.jar
│   ├── apacheds-i18n-2.0.0-M15.jar
│   ├── apacheds-kerberos-codec-2.0.0-M15.jar
│   ├── api-asn1-api-1.0.0-M20.jar
│   ├── api-util-1.0.0-M20.jar
│   ├── asm-3.2.jar
│   ├── avro-1.7.4.jar
│   ├── commons-beanutils-1.7.0.jar
│   ├── commons-beanutils-core-1.8.0.jar
│   ├── commons-cli-1.2.jar
│   ├── commons-codec-1.4.jar
│   ├── commons-collections-3.2.2.jar
│   ├── commons-compress-1.4.1.jar
│   ├── commons-configuration-1.6.jar
│   ├── commons-digester-1.8.jar
│   ├── commons-httpclient-3.1.jar
│   ├── commons-io-2.4.jar
│   ├── commons-lang-2.6.jar
│   ├── commons-logging-1.1.3.jar
│   ├── commons-math3-3.1.1.jar
│   ├── commons-net-3.1.jar
│   ├── curator-client-2.7.1.jar
│   ├── curator-framework-2.7.1.jar
│   ├── curator-recipes-2.7.1.jar
│   ├── fst-2.24.jar
│   ├── gson-2.2.4.jar
│   ├── guava-11.0.2.jar
│   ├── guice-3.0.jar
│   ├── guice-servlet-3.0.jar
│   ├── htrace-core-3.1.0-incubating.jar
│   ├── httpclient-4.2.5.jar
│   ├── httpcore-4.2.5.jar
│   ├── jackson-annotations-2.2.3.jar
│   ├── jackson-core-2.2.3.jar
│   ├── jackson-core-asl-1.9.13.jar
│   ├── jackson-databind-2.2.3.jar
│   ├── jackson-jaxrs-1.9.13.jar
│   ├── jackson-mapper-asl-1.9.13.jar
│   ├── jackson-xc-1.9.13.jar
│   ├── javassist-3.18.1-GA.jar
│   ├── javax.inject-1.jar
│   ├── java-xmlbuilder-0.4.jar
│   ├── jaxb-api-2.2.2.jar
│   ├── jaxb-impl-2.2.3-1.jar
│   ├── jersey-client-1.9.jar
│   ├── jersey-core-1.9.jar
│   ├── jersey-guice-1.9.jar
│   ├── jersey-json-1.9.jar
│   ├── jersey-server-1.9.jar
│   ├── jets3t-0.9.0.jar
│   ├── jettison-1.1.jar
│   ├── jetty-6.1.26.hwx.jar
│   ├── jetty-util-6.1.26.hwx.jar
│   ├── jsch-0.1.42.jar
│   ├── jsp-api-2.1.jar
│   ├── jsr305-3.0.0.jar
│   ├── leveldbjni-all-1.8.jar
│   ├── log4j-1.2.17.jar
│   ├── microsoft-windowsazure-storage-sdk-0.6.0.jar
│   ├── netty-3.6.2.Final.jar
│   ├── objenesis-2.1.jar
│   ├── paranamer-2.3.jar
│   ├── protobuf-java-2.5.0.jar
│   ├── servlet-api-2.5.jar
│   ├── snappy-java-1.0.4.1.jar
│   ├── stax-api-1.0-2.jar
│   ├── xmlenc-0.52.jar
│   ├── xz-1.0.jar
│   ├── zookeeper-3.4.6.2.4.0.0-169.jar
│   └── zookeeper-3.4.6.2.4.0.0-169-tests.jar
└── sbin
├── yarn-daemon.sh
└── yarn-daemons.sh

更新官方rpm安装后的文件

  • 依照Hortonworks官方rpm安装后的文件目录为标准,将我们自己编译的文件复制到相应目录进行替换,编写脚本自动实现从编译到更新文件的工作,针对hadoop-yarn项目的脚本如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/bash
#rebuild hadoop-yarn rpm package
#1. Make sure hdp rpm package has been installed.
#2. Prepare hdp source code. Modify the BUILD_SOURCE_DIR point to your source code folder.
#3. Prepare the build environment: rpmrebuild maven "Development Tools" autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev libprotobuf-dev protobuf-compiler
RPM_NAME=hadoop_2_4_0_0_169-yarn-2.7.1.2.4.0.0-169.el6.x86_64
HDP_VERSION=2.4.0.0-169
HDP_YARN_VERSION=2.7.1.2.4.0.0-169
HADOOP_INSTALL_DIR=/usr/hdp/$HDP_VERSION/
BUILD_SOURCE_DIR=/data/ygmz/hadoop-release
BUILD_TARGET_DIR=$BUILD_SOURCE_DIR/hadoop-yarn-project/target/hadoop-yarn-project-$HDP_YARN_VERSION
#build yarn binary from source code, profile native used to build container-executor.
cd $BUILD_SOURCE_DIR/hadoop-yarn-project
mvn clean
mvn package -Pdist,native -DskipTests -Dtar
#move self-build files to rpm install directories
#move to etc/hadoop/conf.empty/
rm -rf $BUILD_TARGET_DIR/etc/hadoop/*.cmd
mkdir -p $HADOOP_INSTALL_DIR/etc/hadoop/conf.empty
cp -rf $BUILD_TARGET_DIR/etc/hadoop/* $HADOOP_INSTALL_DIR/etc/hadoop/conf.empty/
#move to etc/security
#move to hadoop
cp -rf $BUILD_TARGET_DIR/libexec/yarn-config.sh $HADOOP_INSTALL_DIR/hadoop/libexec/
#move to hadoop-yarn
cp -rf $BUILD_TARGET_DIR/bin/yarn $HADOOP_INSTALL_DIR/hadoop-yarn/bin/yarn.distro
cp -rf $BUILD_TARGET_DIR/bin/container-executor $HADOOP_INSTALL_DIR/hadoop-yarn/bin/
cp -rf $BUILD_TARGET_DIR/share/hadoop/yarn/*.jar $HADOOP_INSTALL_DIR/hadoop-yarn/
cp -rf $BUILD_TARGET_DIR/share/hadoop/yarn/lib/*.jar $HADOOP_INSTALL_DIR/hadoop-yarn/lib/
cp -rf $BUILD_TARGET_DIR/sbin/yarn-daemon.sh $HADOOP_INSTALL_DIR/hadoop-yarn/sbin/
cp -rf $BUILD_TARGET_DIR/sbin/yarn-daemons.sh $HADOOP_INSTALL_DIR/hadoop-yarn/sbin/

使用rpmrebuild将更新后的文件重新打包成rpm

  • rpmrebuild是一个用来重新打包rpm的第三方开源工具, 可以将已安装的rpm包按照原rpm的方式重新打包,%pre,%post,%files等部分的脚本保持不变,因此在我们使用自己编译的文件替换官方rpm安装好的文件后,可以直接使用rpmrebuild进行重新打包。
1
2
#rebuild rpm
rpmrebuild $RPM_NAME
  • $RPM_NAME 可以使用rpm -qa查询得到
Contents
  1. 1. Hortonworks源代码下载
  2. 2. 准备编译环境
  3. 3. 源代码编译
  4. 4. 安装Hortonworks官方RPM
  5. 5. 更新官方rpm安装后的文件
  6. 6. 使用rpmrebuild将更新后的文件重新打包成rpm