hadoop22安装配置文档格式.docx
《hadoop22安装配置文档格式.docx》由会员分享,可在线阅读,更多相关《hadoop22安装配置文档格式.docx(15页珍藏版)》请在冰点文库上搜索。
property>
name>
fs.defaultFS<
/name>
value>
hdfs:
//mycluster<
/value>
/property>
io.file.buffer.size<
131072<
hadoop.tmp.dir<
file:
/home/hadoop/hadoop-2.2.0/tmp<
description>
Abaseforothertemporarydirectories.<
/description>
hadoop.proxyuser.hadoop.hosts<
*<
hadoop.proxyuser.hadoop.groups<
ha.zookeeper.quorum<
192.168.0.2:
2181,192.168.0.3:
2181,192.168.0.4:
2181<
<
ha.zookeeper.session-timeout.ms<
1000<
ms<
#hdfs-site.xml
dfs.namenode.name.dir<
/home/hadoop/hadoop-2.2.0/dfs/name<
dfs.datanode.data.dir<
/home/hadoop/hadoop-2.2.0/dfs/data<
dfs.replication<
1<
dfs.webhdfs.enabled<
true<
dfs.permissions<
false<
dfs.permissions.enabled<
dfs.nameservices<
mycluster<
Logicalnameforthisnewnameservice<
dfs.ha.namenodes.mycluster<
master1,master2<
UniqueidentifiersforeachNameNodeinthenameservice<
dfs.namenode.rpc-address.mycluster.master1<
9000<
dfs.namenode.rpc-address.mycluster.master2<
192.168.0.3:
dfs.namenode.servicerpc-address.mycluster.master1<
53310<
dfs.namenode.servicerpc-address.mycluster.master2<
dfs.namenode.http-address.mycluster.master1<
50070<
dfs.namenode.http-address.mycluster.master2<
dfs.namenode.shared.edits.dir<
qjournal:
//192.168.0.2:
8485;
192.168.0.4:
8485/mycluster<
dfs.client.failover.proxy.provider.mycluster<
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider<
dfs.ha.fencing.methods<
sshfence<
dfs.ha.fencing.ssh.private-key-files<
/home/hadoop/.ssh/id_rsa<
dfs.ha.fencing.ssh.connect-timeout<
30000<
dfs.journalnode.edits.dir<
/home/hadoop/hadoop-2.2.0/journaldata<
dfs.ha.automatic-failover.enabled<
ha.failover-controller.cli-check.rpc-timeout.ms<
60000<
ipc.client.connect.timeout<
dfs.image.transfer.bandwidthPerSec<
4194304<
#yarn-site.xml
yarn.nodemanager.aux-services<
mapreduce_shuffle<
yarn.nodemanager.aux-services.mapreduce.shuffle.class<
org.apache.hadoop.mapred.ShuffleHandler<
yarn.resourcemanager.address<
8032<
yarn.resourcemanager.scheduler.address<
8030<
yarn.resourcemanager.resource-tracker.address<
8031<
yarn.resourcemanager.admin.address<
8033<
yarn.resourcemanager.webapp.address<
8088<
#mapred-site.xml
mapreduce.framework.name<
yarn<
mapreduce.jobhistory.address<
master1:
10020<
mapreduce.jobhistory.webapp.address<
19888<
7.启动
(1)启动ZK
在所有的ZK节点执行命令:
zkServer.shstart
查看各个ZK的从属关系:
yarn@master:
~$
zkServer.shstatus
JMXenabledbydefault
Usingconfig:
/home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode:
follower
yarn@slave1:
yarn@slave2:
leader
注意:
哪个ZK节点会成为leader是随机的,第一次实验时slave2成为了leader,第二次实验时slave1成为了leader!
此时,在各个节点都可以查看到ZK进程:
~$jps
3084
QuorumPeerMain
3212Jps
(2)格式化ZK(仅第一次需要做)
任意ZK节点上执行:
hdfszkfc-formatZK
(3)启动ZKFC
ZookeeperFailoverController是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行:
hadoop-daemon.shstartzkfc
启动后我们可以看到ZKFC进程:
3084QuorumPeerMain
3292Jps
3247
DFSZKFailoverController
(4)启动用于主备NN之间同步元数据信息的共享存储系统JournalNode
参见角色分配表,在各个JN节点上启动:
hadoop-daemon.shstart
journalnode
启动后在各个JN节点都可以看到JournalNode进程:
3358Jps
3325
JournalNode
3247DFSZKFailoverController
(5)格式化并启动主NN
格式化:
hdfsnamenode-format
只有第一次启动系统时需格式化,请勿重复格式化!
在主NN节点执行命令启动NN:
hadoop-daemon.shstartnamenode
启动后可以看到NN进程:
3480Jps
3325JournalNode
3411
NameNode
(6)在备NN上同步主NN的元数据信息
hdfsnamenode-bootstrapStandby
以下是正常执行时的最后部分日志:
Re-formatfilesysteminStorageDirectory/home/yarn/Hadoop/hdfs2.0/name?
(YorN)Y
14/06/1510:
09:
08INFOcommon.Storage:
Storagedirectory/home/yarn/Hadoop/hdfs2.0/namehasbeensuccessfullyformatted.
09INFOnamenode.TransferFsImage:
Openingconnectiontohttp:
//master:
50070/getimage?
getimage=1&
txid=935&
storageInfo=-47:
564636372:
0:
CID-d899b10e-10c9-4851-b60d-3e158e322a62
Transfertook0.11sat63.64KB/s
Downloadedfilefsimage.ckpt_0000000000000000935size7545bytes.
09INFOutil.ExitUtil:
Exitingwithstatus0
09INFOnamenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:
ShuttingdownNameNodeatslave1/192.168.66.92
************************************************************/
(7)启动备NN
在备NN上执行命令:
(8)设置主NN(这一步可以省略,这是在设置手动切换NN时的步骤,ZK已经自动选择一个节点作为主NN了)
到目前为止,其实HDFS还不知道谁是主NN,可以通过监控页面查看,两个节点的NN都是Standby状态。
下面我们需要在主NN节点上执行命令激活主NN:
hdfshaadmin-transitionToActivenn1
(9)在主NN上启动Datanode
在[nn1]上,启动所有datanode
hadoop-daemons.shstartdatanode
8.效果验证1--主备自动切换
目前的主NN是192.168.0.91
备NN是192.168.0.92
我在主NN上kill掉NameNode进程:
5161NameNode
5085JournalNode
5438Jps
4987DFSZKFailoverController
4904QuorumPeerMain
kill5161
5451Jps
此时,主NN监控页面无法访问:
备NN自动切换为主NN:
9.效果验证2--HA对shell的透明性
访问逻辑名myhadoop,执行命令查看目录结构,不受影响:
yarn@slave3:
~$hadoopdfs-lshdfs:
//myhadoop/
DEPRECATED:
Useofthisscripttoexecutehdfscommandisdeprecated.
Insteadusethehdfscommandforit.
Found3items
drwxr-xr-x
-yarnsupergroup
02014-03-2000:
10hdfs:
//myhadoop/home
drwxrwx---
02014-03-1720:
11hdfs:
//myhadoop/tmp
15hdfs:
//myhadoop/workspace
10.效果验证3--HA对Client程序的透明性
使用自己写的HdfsDAO.java测试,程序中将指向HDFS的path设置为:
privatestaticfinalStringHDFS="
//myhadoop/"
;
先pingmyhadoop确保没有配置hosts,然后运行程序,一切正常:
~$pingmyhadoop
ping:
unknownhostmyhadoop
~$hadoopjarDesktop/hatest.jarHdfsDAO
ls:
/
==========================================================
name:
hdfs:
//myhadoop/home,folder:
true,size:
0
//myhadoop/tmp,folder:
//myhadoop/workspace,folder: