Thursday 3 November 2016

Hadoop Multinode Installation in RHEL 6

Step 1

Assign static ip to all the required servers.

Ex.

192.168.0.90 - master
192.168.0.91 - slave1
192.168.0.92 - slave 2

Step 2

Make sure all machines have host entries of all servers.

vi /etc/hosts

192.168.0.90  master
192.168.0.91  slave 1
192.168.0.92  slave 2

Step 3


Create hadoop user in all the servers

useradd hadoop

passwd hadoop

Step 4

Setup ssh passwordless authentication for all the server for inter communication.

su hadoop

ssh-keygen -t rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2

Same need to do all the slave servers.

Step 5

Disable the ipv6 in master and slave servers hadoop will not support. insert the below details in /etc/sysctl.conf in last line


net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Step 6

Set environment variables uses by hadoop by editing /etc/profile file 
and append the following values at the end of the file by excuting below commands

echo "" >> /etc/profile

echo "### HADOOP Variables ###" >> /etc/profile

echo "export HADOOP_HOME=/opt/hadoop" >> /etc/profile

echo "export HADOOP_INSTALL=\$HADOOP_HOME" >> /etc/profile

echo "export HADOOP_MAPRED_HOME=\$HADOOP_HOME" >> /etc/profile

echo "export HADOOP_COMMON_HOME=\$HADOOP_HOME" >> /etc/profile

echo "export HADOOP_HDFS_HOME=\$HADOOP_HOME" >> /etc/profile

echo "export YARN_HOME=\$HADOOP_HOME" >> /etc/profile

echo "export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native" >> /etc/profile

echo "export PATH=\$PATH:\$HADOOP_HOME/sbin:\$HADOOP_HOME/bin" >> /etc/profile

Reload Configuration using below command.

source /etc/profile

Step 7

Create Hadoop data directories in all nodes (root can only have the access to create)

mkdir -p /data/hadoop-data/nn 

mkdir -p /data/hadoop-data/snn

mkdir -p /data/hadoop-data/dn

mkdir -p /data/hadoop-data/mapred/system

mkdir -p /data/hadoop-data/mapred/local

chown -R hadoop:hadoop /data/

step 8

Download the hadoop package and tar it to hadoop folder. 
(root can only have the access to create)

cd /opt/

wget http://mirror.fibergrid.in/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz

tar -xvf hadoop-2.7.3-src.tar.gz

mv hadoop-2.7.3 hadoop

change the directory owner to hadoop

chown -R haddop:hadoop /opt/hadoop/

Step 9

Now we need to check the java version and assign the java home in hadoop by following steps.

java -version

java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

hadoop-env.sh

Environment variables that are used in the scripts to run hadoop

now edit the file hadoop-env.sh in hadoop

vi /opt/hadoop/etc/hadoop/hadoop-env.sh

add the JAVA_HOME like

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64

save and exit

step 10

hdfs-site.xml 
Configuration settings for HDFS daemons, the namenode, the secondary namenode and the datanodes.
Edit the hdfs-site.xml file with append below details.

vi /opt/hadoop/etc/hadoop/hdfc-site.xml

and add the details below between configuration

<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/hdfs/data</value>
<description>DataNode directory for storing data chunks.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/hdfs</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Number of replication for each chunk.</description>
</property>

</configuration>

save and exit

Step 11

core-site.xml

Configuration settings for Hadoop core such as I/O settings that are common to hdfs and Mapreduce.

Now edit core-site.xml file with append below details.

vi /opt/hadoop/etc/hadoop/core-site.xml

<configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

</configuration>

save and exit

Step 12

mapred-site.xml

Configuration settings for MapReduce Applications.

Now edit mapred-site.xml file with append below details.

vi /opt/hadoop/etc/hadoop/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework.</description>
</property>
</configuration>

save and exit

Step 13

yarn-site.xml

Configuration settings fot ResourceManager and NodeManager.

Now edit yarn-site.xml file with append below details.

vi /opt/hadoop/etc/hadoop/yarn-site.xml

<configuration>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the ResourceManager</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service for MapReduce</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
 <value>master:8040</value>

</configuration>

save and exit

Step 14

Slaves

A list of machines (one per line)  that each run a Datanode and Nodemanager

Now edit slaves file and append the slave hostname or ip in that file

vi /opt/hadoop/etc/hadoop/slaves

master
slave1
slave2

save and exit

Step 15

Copy the hadoop directory to the all the slave servers with following command

scp-r /opt/hadoop hadoop@slave1:/opt/
scp-r /opt/hadoop hadoop@slave2:/opt/

In slave server you have to change the permission or owner for that folder to access 

chown -R hadoop:hadoop /opt/hadoop

Step 16

Masters

A list of machines (one per line) that each run a secondary name nodes.

In master server we have to create masters in hadoop conf location and add the master.

vi /opt/hadoop/etc/hadoop/masters

add

master

save and exit.

Step 17

Next we have to format the NameNode and start “all hadoop services”. The command used to format the NameNode is hdfs namenode –format.

cd /opt/hadoop

hdfs namenode -format

After formatting the NameNode, we can start the Hadoop services.

cd /opt/hadoop/sbin

start-all.sh

This will setup our multi-node cluster. We can check for the same by opening the browser and entering the IP of the NameNode. This is shown below:





URL List:

Name Node : http://192.168.0.90:50070 or http://master:50070
Yarn Services : http://192.168.0.90:8088
Secondary Name Node : http://192.168.0.90:50090
Data Node : http://192.168.0.90:50075

If datanode is not running then just we need to do the following steps

remove the data in data directory

rm -rf /data/*

then 

cd /opt/hadoop

bin/hadoop namenode -format

If datanode is not running in slave then follow the steps

bin/hadoop datanode 

it will start the datanode 

No comments:

Post a Comment

Permanent hostname setup for RHEL7

Step 1 Set the host name on NMTUI tool like following nmtui set host name   then save and exit Step 2 add the following l...