hadoop Tutorial => Installation or Setup on Linux

Example

A Pseudo Distributed Cluster Setup Procedure

Prerequisites

Install JDK1.7 and set JAVA_HOME environment variable.
Create a new user as "hadoop".

useradd hadoop

Setup password-less SSH login to its own account

 su - hadoop
 ssh-keygen
 << Press ENTER for all prompts >>
 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
 chmod 0600 ~/.ssh/authorized_keys

Verify by performing ssh localhost

Disable IPV6 by editing /etc/sysctl.conf with the followings:

 net.ipv6.conf.all.disable_ipv6 = 1
 net.ipv6.conf.default.disable_ipv6 = 1
 net.ipv6.conf.lo.disable_ipv6 = 1

Check that using cat /proc/sys/net/ipv6/conf/all/disable_ipv6

(should return 1)

Installation and Configuration:

Download required version of Hadoop from Apache archives using wget command.

 cd /opt/hadoop/
 wget http:/addresstoarchive/hadoop-2.x.x/xxxxx.gz
 tar -xvf hadoop-2.x.x.gz
 mv hadoop-2.x.x.gz hadoop 
(or)

 ln -s hadoop-2.x.x.gz hadoop
 chown -R hadoop:hadoop hadoop

Update .bashrc/.kshrc based on your shell with below environment variables

  export HADOOP_PREFIX=/opt/hadoop/hadoop
  export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
  export JAVA_HOME=/java/home/path
  export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$JAVA_HOME/bin

In $HADOOP_HOME/etc/hadoop directory edit below files

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>

mapred-site.xml

Create mapred-site.xml from its template

cp mapred-site.xml.template mapred-site.xml

  <configuration>
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
      </property>
  </configuration>

yarn-site.xml

 <configuration>
     <property>
         <name>yarn.resourcemanager.hostname</name>
         <value>localhost</value> 
     </property>
     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
 </configuration>

hdfs-site.xml

 <configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
     <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:///home/hadoop/hdfs/namenode</value>
     </property>
     <property>
         <name>dfs.datanode.data.dir</name> 
         <value>file:///home/hadoop/hdfs/datanode</value> 
     </property>
 </configuration>

Create the parent folder to store the hadoop data

mkdir -p /home/hadoop/hdfs

Format NameNode (cleans up the directory and creates necessary meta files)
```
hdfs namenode -format
```

Start all services:

start-dfs.sh && start-yarn.sh
mr-jobhistory-server.sh start historyserver

Instead use start-all.sh (deprecated).

Check all running java processes
```
jps 
```
Namenode Web Interface: http://localhost:50070/
Resource manager Web Interface: http://localhost:8088/

To stop daemons(services):

stop-dfs.sh && stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver

Instead use stop-all.sh (deprecated).

PDF - Download hadoop for free

Previous Next

hadoop

Fastest Entity Framework Extensions

Example

Got any hadoop Question?

hadoop

hadoop Getting started with hadoop Installation or Setup on Linux

Fastest Entity Framework Extensions

Example

Got any hadoop Question?