Step 1
Assign static ip to all the required servers.
Ex. - master - slave1 - slave 2
Step 2
Make sure all machines have host entries of all servers.
vi /etc/hosts master slave 1 slave 2
Step 3
useradd hadoop
passwd hadoop
Step 4
Setup ssh passwordless authentication for all the server for inter communication.
su hadoop
ssh-keygen -t rsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/ hadoop@slave1
ssh-copy-id -i ~/.ssh/ hadoop@slave2
Same need to do all the slave servers.
Step 5
Disable the ipv6 in master and slave servers hadoop will not support. insert the below details in /etc/sysctl.conf in last line
<description>The hostname of the ResourceManager</description>
<description>shuffle service for MapReduce</description>
Assign static ip to all the required servers.
Ex. - master - slave1 - slave 2
Step 2
Make sure all machines have host entries of all servers.
vi /etc/hosts master slave 1 slave 2
Step 3
Create hadoop user in all the servers
useradd hadoop
passwd hadoop
Step 4
Setup ssh passwordless authentication for all the server for inter communication.
su hadoop
ssh-keygen -t rsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/ hadoop@slave1
ssh-copy-id -i ~/.ssh/ hadoop@slave2
Same need to do all the slave servers.
Step 5
Disable the ipv6 in master and slave servers hadoop will not support. insert the below details in /etc/sysctl.conf in last line
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Step 6
Set environment variables uses by hadoop by editing /etc/profile file
and append the following values at the end of the file by excuting below commands
echo "" >> /etc/profile
echo "### HADOOP Variables ###" >> /etc/profile
echo "export HADOOP_HOME=/opt/hadoop" >> /etc/profile
echo "export HADOOP_INSTALL=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_MAPRED_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_COMMON_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_HDFS_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export YARN_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native" >> /etc/profile
echo "export PATH=\$PATH:\$HADOOP_HOME/sbin:\$HADOOP_HOME/bin" >> /etc/profile
Reload Configuration using below command.
source /etc/profile
Step 7
Create Hadoop data directories in all nodes (root can only have the access to create)
mkdir -p /data/hadoop-data/nn
mkdir -p /data/hadoop-data/snn
mkdir -p /data/hadoop-data/dn
mkdir -p /data/hadoop-data/mapred/system
mkdir -p /data/hadoop-data/mapred/local
chown -R hadoop:hadoop /data/
step 8
Download the hadoop package and tar it to hadoop folder.
(root can only have the access to create)
cd /opt/
tar -xvf hadoop-2.7.3-src.tar.gz
mv hadoop-2.7.3 hadoop
change the directory owner to hadoop
chown -R haddop:hadoop /opt/hadoop/
Step 9
Now we need to check the java version and assign the java home in hadoop by following steps.
java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
Environment variables that are used in the scripts to run hadoop
now edit the file in hadoop
vi /opt/hadoop/etc/hadoop/
add the JAVA_HOME like
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
save and exit
step 10
hdfs-site.xmlConfiguration settings for HDFS daemons, the namenode, the secondary namenode and the datanodes.
Edit the hdfs-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/hdfc-site.xml
and add the details below between configuration
<description>DataNode directory for storing data chunks.</description>
<description>NameNode directory for namespace and transaction logs storage.</description>
<description>Number of replication for each chunk.</description>
save and exit
Step 11
Configuration settings for Hadoop core such as I/O settings that are common to hdfs and Mapreduce.
Now edit core-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/core-site.xml
save and exit
Step 12
Configuration settings for MapReduce Applications.
Now edit mapred-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/mapred-site.xml
<description>Execution framework.</description>
save and exit
Step 13
Configuration settings fot ResourceManager and NodeManager.
Now edit yarn-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/yarn-site.xml
<description>The hostname of the ResourceManager</description>
<description>shuffle service for MapReduce</description>
save and exit
Step 14
A list of machines (one per line) that each run a Datanode and Nodemanager
Now edit slaves file and append the slave hostname or ip in that file
vi /opt/hadoop/etc/hadoop/slaves
save and exit
Step 15
Copy the hadoop directory to the all the slave servers with following command
scp-r /opt/hadoop hadoop@slave1:/opt/
scp-r /opt/hadoop hadoop@slave2:/opt/
In slave server you have to change the permission or owner for that folder to access
chown -R hadoop:hadoop /opt/hadoop
Step 16
A list of machines (one per line) that each run a secondary name nodes.
In master server we have to create masters in hadoop conf location and add the master.
vi /opt/hadoop/etc/hadoop/masters
save and exit.
Step 17
Next we have to format the NameNode and start “all hadoop services”. The command used to format the NameNode is hdfs namenode –format.
cd /opt/hadoop
hdfs namenode -format
After formatting the NameNode, we can start the Hadoop services.
cd /opt/hadoop/sbin
This will setup our multi-node cluster. We can check for the same by
opening the browser and entering the IP of the NameNode. This is shown
URL List:
Name Node : or http://master:50070
Yarn Services :
Secondary Name Node :
Data Node :
If datanode is not running then just we need to do the following steps
remove the data in data directory
rm -rf /data/*
cd /opt/hadoop
bin/hadoop namenode -format
If datanode is not running in slave then follow the steps
bin/hadoop datanode
it will start the datanode
remove the data in data directory
rm -rf /data/*
cd /opt/hadoop
bin/hadoop namenode -format
If datanode is not running in slave then follow the steps
bin/hadoop datanode
it will start the datanode
No comments:
Post a Comment