转载:https://www.cnblogs.com/marility/p/9362168.html
1.测试环境
ip | 主机名 | 角色 |
---|---|---|
10.124.147.22 | hadoop1 | namenode |
10.124.147.23 | hadoop2 | namenode |
10.124.147.32 | hadoop3 | resourcemanager |
10.124.147.33 | hadoop4 | resourcemanager |
10.110.92.161 | hadoop5 | datanode/journalnode |
10.110.92.162 | hadoop6 | datanode |
10.122.147.37 | hadoop7 | datanode |
2.配置文件中必备参数
2.1 hdfs-site.xml参数
[hadoop@10-124-147-22 hadoop]$ grep dfs\.host -A10 /usr/local/hadoop/etc/hadoop/hdfs-site.xmldfs.hosts.exclude /usr/local/hadoop/etc/hadoop/dfs_exclude dfs.hosts /usr/local/hadoop/etc/hadoop/slaves
2.2 yarn-site.xml参数
[hadoop@10-124-147-22 hadoop]$ grep exclude-path -A10 /usr/local/hadoop/etc/hadoop/yarn-site.xmlyarn.resourcemanager.nodes.exclude-path /usr/local/hadoop/etc/hadoop/dfs_exclude yarn.resourcemanager.nodes.include-path /usr/local/hadoop/etc/hadoop/slaves
3.踢除现有主机
1.在namenode主机中,将要踢除主机的ip添加到hdfs-site.xml配置文件dfs.hosts.exclude
参数指定的文件dfs_exclude
中
[hadoop@10-124-147-22 hadoop]$ cat /usr/local/hadoop/etc/hadoop/dfs_exclude 10.122.147.37
2.将其copy至hadoop其它主机
[hadoop@10-124-147-22 hadoop]$ for i in { 2,3,4,5,6,7};do scp etc/hadoop/dfs_exclude hadoop$i:/usr/local/hadoop/etc/hadoop/;done
3.更新namenode信息
[hadoop@10-124-147-22 hadoop]$ hdfs dfsadmin -refreshNodesRefresh nodes successful for hadoop1/10.124.147.22:9000Refresh nodes successful for hadoop2/10.124.147.23:9000
4.查看namenode 状态信息
[hadoop@10-124-147-22 hadoop]$ hdfs dfsadmin -reportConfigured Capacity: 1100228980736 (1.00 TB)Present Capacity: 1087754866688 (1013.05 GB)DFS Remaining: 1087752667136 (1013.05 GB)DFS Used: 2199552 (2.10 MB)DFS Used%: 0.00%Under replicated blocks: 11Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (3):Name: 10.122.147.37:50010 (hadoop7)Hostname: hadoop7Decommission Status : Decommission in progressConfigured Capacity: 250831044608 (233.60 GB)DFS Used: 733184 (716 KB)Non DFS Used: 1235771392 (1.15 GB)DFS Remaining: 249594540032 (232.45 GB)DFS Used%: 0.00%DFS Remaining%: 99.51%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Tue Jul 24 10:25:17 CST 2018Name: 10.110.92.161:50010 (hadoop5)Hostname: hadoop5Decommission Status : Normal以下略
可以看到被踢除主机10.122.147.37的状态变成Decommission in progress
,表示集群对存放于该节点的副本正在进行转移。当其变成Decommissioned
时,即代表已经结束,相当于已经踢除集群。
同时此状态可以在hdfs的web页面查看
5.更新resourcemananger信息
[hadoop@10-124-147-32 hadoop]$ yarn rmadmin -refreshNodes
更新之后,可以在resourcemanager的web页面查看到Active Nodes
的信息
或者使用命令查看
[hadoop@10-124-147-32 hadoop]$ yarn node -listTotal Nodes:2 Node-Id Node-State Node-Http-Address Number-of-Running-Containers hadoop5:37438 RUNNING hadoop5:8042 0 hadoop6:9001 RUNNING hadoop6:8042 0
4.添加新主机至集群
1.将原hadoop配置文件copy新主机,并安装好java环境
2.在namenode中将新主机的ip添加于dfs.hosts
参数指定的文件中 [hadoop@10-124-147-22 hadoop]$ cat /usr/local/hadoop/etc/hadoop/slaves hadoop5hadoop610.122.147.37
3.将该slaves文件同步到其它主机之上
[hadoop@10-124-147-22 hadoop]$ for i in { 2,3,4,5,6,7};do scp etc/hadoop/slaves hadoop$i:/usr/local/hadoop/etc/hadoop/;done
4.启动新主机的datanode进程和nodemanager进程
[hadoop@10-122-147-37 hadoop]$ sbin/hadoop-daemon.sh start datanodestarting datanode, logging to /letv/hadoop-2.7.6/logs/hadoop-hadoop-datanode-10-122-147-37.out[hadoop@10-122-147-37 hadoop]$ jps3068 DataNode6143 Jps[hadoop@10-122-147-37 hadoop]$ sbin/yarn-daemon.sh start nodemanagerstarting nodemanager, logging to /letv/hadoop-2.7.6/logs/yarn-hadoop-nodemanager-10-122-147-37.out[hadoop@10-122-147-37 hadoop]$ jps6211 NodeManager6403 Jps3068 DataNode
5.刷新namenode
[hadoop@10-124-147-22 hadoop]$ hdfs dfsadmin -refreshNodesRefresh nodes successful for hadoop1/10.124.147.22:9000Refresh nodes successful for hadoop2/10.124.147.23:9000
6.查看hdfs信息
[hadoop@10-124-147-22 hadoop]$ hdfs dfsadmin -refreshNodesRefresh nodes successful for hadoop1/10.124.147.22:9000Refresh nodes successful for hadoop2/10.124.147.23:9000[hadoop@10-124-147-22 hadoop]$ hdfs dfsadmin -reportConfigured Capacity: 1351059292160 (1.23 TB)Present Capacity: 1337331367936 (1.22 TB)DFS Remaining: 1337329156096 (1.22 TB)DFS Used: 2211840 (2.11 MB)DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (3):Name: 10.122.147.37:50010 (hadoop7)Hostname: hadoop7Decommission Status : NormalConfigured Capacity: 250831044608 (233.60 GB)DFS Used: 737280 (720 KB)Non DFS Used: 1240752128 (1.16 GB)DFS Remaining: 249589555200 (232.45 GB)DFS Used%: 0.00%DFS Remaining%: 99.51%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Tue Jul 24 17:15:09 CST 2018Name: 10.110.92.161:50010 (hadoop5)Hostname: hadoop5Decommission Status : NormalConfigured Capacity: 550114123776 (512.33 GB)DFS Used: 737280 (720 KB)Non DFS Used: 11195953152 (10.43 GB)DFS Remaining: 538917433344 (501.91 GB)DFS Used%: 0.00%DFS Remaining%: 97.96%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Tue Jul 24 17:15:10 CST 2018Name: 10.110.92.162:50010 (hadoop6)Hostname: hadoop6Decommission Status : NormalConfigured Capacity: 550114123776 (512.33 GB)DFS Used: 737280 (720 KB)Non DFS Used: 1291218944 (1.20 GB)DFS Remaining: 548822167552 (511.13 GB)DFS Used%: 0.00%DFS Remaining%: 99.77%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Tue Jul 24 17:15:10 CST 2018
7.更新resourcemanager信息
[hadoop@10-124-147-32 hadoop]$ yarn rmadmin -refreshNodes[hadoop@10-124-147-32 hadoop]$ yarn node -list18/07/24 18:11:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers hadoop7:3296 RUNNING hadoop7:8042 hadoop5:37438 RUNNING hadoop5:8042 0 hadoop6:9001 RUNNING hadoop6:8042 0
8.include与exclude对yarn和hdfs的影响
判断一个nodemanager能否连接到resourcemanager的条件是,该nodemanager出现在include文件且不出现exclude文件中
而hdfs规与yarn不太一样(hdfs中的include直接即为dfs.hosts),其规则如下表
是否在include | 是否在exclude | 是否可连接 |
---|---|---|
否 | 否 | 无法连接 |
否 | 是 | 无法连接 |
是 | 否 | 可以连接 |
是 | 是 | 可连接,即将解除 |
如果未指定include或者include为空,即意味着所有节点都在include文件
5.遇到异常
在移除datanode中的,会遇到被移除datanode一直处于Decommission in progress
状态,这是因为默认测试环境中,没有设置副本数量,在hadoop中的默认副本数为3,而本测试环境中,因为datanode总共只有3个节点,所以会出现该异常
将副本数量设置成小于datanode数量即可
[hadoop@10-124-147-22 hadoop]$ grep dfs\.replication -C3 /usr/local/hadoop/etc/hadoop/hdfs-site.xmldfs.replication 1