A Pseudo Distributed Cluster Setup Procedure
Prerequisites
Install JDK1.7 and set JAVA_HOME environment variable.
Create a new user as "hadoop".
useradd hadoop
Setup password-less SSH login to its own account
su - hadoop
ssh-keygen
<< Press ENTER for all prompts >>
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Verify by performing ssh localhost
Disable IPV6 by editing /etc/sysctl.conf
with the followings:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Check that using cat /proc/sys/net/ipv6/conf/all/disable_ipv6
(should return 1)
Installation and Configuration:
Download required version of Hadoop from Apache archives using wget
command.
cd /opt/hadoop/
wget http:/addresstoarchive/hadoop-2.x.x/xxxxx.gz
tar -xvf hadoop-2.x.x.gz
mv hadoop-2.x.x.gz hadoop
(or)
ln -s hadoop-2.x.x.gz hadoop
chown -R hadoop:hadoop hadoop
Update .bashrc
/.kshrc
based on your shell with below environment variables
export HADOOP_PREFIX=/opt/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export JAVA_HOME=/java/home/path
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$JAVA_HOME/bin
In $HADOOP_HOME/etc/hadoop
directory edit below files
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
mapred-site.xml
Create mapred-site.xml
from its template
cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Create the parent folder to store the hadoop data
mkdir -p /home/hadoop/hdfs
Format NameNode (cleans up the directory and creates necessary meta files)
hdfs namenode -format
Start all services:
start-dfs.sh && start-yarn.sh
mr-jobhistory-server.sh start historyserver
Instead use start-all.sh (deprecated).
Check all running java processes
jps
Namenode Web Interface: http://localhost:50070/
Resource manager Web Interface: http://localhost:8088/
To stop daemons(services):
stop-dfs.sh && stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver
Instead use stop-all.sh (deprecated).