Step 1
Assign static ip to all the required servers.
Ex.
192.168.0.90 - master
192.168.0.91 - slave1
192.168.0.92 - slave 2
Step 2
Make sure all machines have host entries of all servers.
vi /etc/hosts
192.168.0.90 master
192.168.0.91 slave 1
192.168.0.92 slave 2
Step 3
useradd hadoop
passwd hadoop
Step 4
Setup ssh passwordless authentication for all the server for inter communication.
su hadoop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
Same need to do all the slave servers.
Step 5
Disable the ipv6 in master and slave servers hadoop will not support. insert the below details in /etc/sysctl.conf in last line
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the ResourceManager</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service for MapReduce</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
master
Assign static ip to all the required servers.
Ex.
192.168.0.90 - master
192.168.0.91 - slave1
192.168.0.92 - slave 2
Step 2
Make sure all machines have host entries of all servers.
vi /etc/hosts
192.168.0.90 master
192.168.0.91 slave 1
192.168.0.92 slave 2
Step 3
Create hadoop user in all the servers
useradd hadoop
passwd hadoop
Step 4
Setup ssh passwordless authentication for all the server for inter communication.
su hadoop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
Same need to do all the slave servers.
Step 5
Disable the ipv6 in master and slave servers hadoop will not support. insert the below details in /etc/sysctl.conf in last line
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Step 6
Set environment variables uses by hadoop by editing /etc/profile file
and append the following values at the end of the file by excuting below commands
echo "" >> /etc/profile
echo "### HADOOP Variables ###" >> /etc/profile
echo "export HADOOP_HOME=/opt/hadoop" >> /etc/profile
echo "export HADOOP_INSTALL=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_MAPRED_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_COMMON_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_HDFS_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export YARN_HOME=\$HADOOP_HOME" >> /etc/profile
echo "export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native" >> /etc/profile
echo "export PATH=\$PATH:\$HADOOP_HOME/sbin:\$HADOOP_HOME/bin" >> /etc/profile
Reload Configuration using below command.
source /etc/profile
Step 7
Create Hadoop data directories in all nodes (root can only have the access to create)
mkdir -p /data/hadoop-data/nn
mkdir -p /data/hadoop-data/snn
mkdir -p /data/hadoop-data/dn
mkdir -p /data/hadoop-data/mapred/system
mkdir -p /data/hadoop-data/mapred/local
chown -R hadoop:hadoop /data/
step 8
Download the hadoop package and tar it to hadoop folder.
(root can only have the access to create)
cd /opt/
wget http://mirror.fibergrid.in/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
tar -xvf hadoop-2.7.3-src.tar.gz
mv hadoop-2.7.3 hadoop
change the directory owner to hadoop
chown -R haddop:hadoop /opt/hadoop/
Step 9
Now we need to check the java version and assign the java home in hadoop by following steps.
java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
hadoop-env.sh
Environment variables that are used in the scripts to run hadoop
now edit the file hadoop-env.sh in hadoop
vi /opt/hadoop/etc/hadoop/hadoop-env.sh
add the JAVA_HOME like
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
save and exit
step 10
hdfs-site.xmlConfiguration settings for HDFS daemons, the namenode, the secondary namenode and the datanodes.
Edit the hdfs-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/hdfc-site.xml
and add the details below between configuration
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/hdfs/data</value>
<description>DataNode directory for storing data chunks.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/hdfs</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Number of replication for each chunk.</description>
</property>
</configuration>
save and exit
Step 11
core-site.xml
Configuration settings for Hadoop core such as I/O settings that are common to hdfs and Mapreduce.
Now edit core-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
save and exit
Step 12
mapred-site.xml
Configuration settings for MapReduce Applications.
Now edit mapred-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework.</description>
</property>
</configuration>
save and exit
Step 13
yarn-site.xml
Configuration settings fot ResourceManager and NodeManager.
Now edit yarn-site.xml file with append below details.
vi /opt/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the ResourceManager</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service for MapReduce</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</configuration>
save and exit
Step 14
Slaves
A list of machines (one per line) that each run a Datanode and Nodemanager
Now edit slaves file and append the slave hostname or ip in that file
vi /opt/hadoop/etc/hadoop/slaves
master
slave1
slave2
save and exit
Step 15
Copy the hadoop directory to the all the slave servers with following command
scp-r /opt/hadoop hadoop@slave1:/opt/
scp-r /opt/hadoop hadoop@slave2:/opt/
In slave server you have to change the permission or owner for that folder to access
chown -R hadoop:hadoop /opt/hadoop
Step 16
Masters
A list of machines (one per line) that each run a secondary name nodes.
In master server we have to create masters in hadoop conf location and add the master.
vi /opt/hadoop/etc/hadoop/masters
add
master
save and exit.
Step 17
Next we have to format the NameNode and start “all hadoop services”. The command used to format the NameNode is hdfs namenode –format.
cd /opt/hadoop
hdfs namenode -format
After formatting the NameNode, we can start the Hadoop services.
cd /opt/hadoop/sbin
start-all.sh
This will setup our multi-node cluster. We can check for the same by
opening the browser and entering the IP of the NameNode. This is shown
below:
URL List:
Name Node : http://192.168.0.90:50070 or http://master:50070
Yarn Services : http://192.168.0.90:8088
Secondary Name Node : http://192.168.0.90:50090
Data Node : http://192.168.0.90:50075
If datanode is not running then just we need to do the following steps
remove the data in data directory
rm -rf /data/*
then
cd /opt/hadoop
bin/hadoop namenode -format
If datanode is not running in slave then follow the steps
bin/hadoop datanode
it will start the datanode
remove the data in data directory
rm -rf /data/*
then
cd /opt/hadoop
bin/hadoop namenode -format
If datanode is not running in slave then follow the steps
bin/hadoop datanode
it will start the datanode
No comments:
Post a Comment