hadoop Getting started with hadoop Installation or Setup on Linux


A Pseudo Distributed Cluster Setup Procedure


  • Install JDK1.7 and set JAVA_HOME environment variable.

  • Create a new user as "hadoop".

    useradd hadoop

  • Setup password-less SSH login to its own account

     su - hadoop
     << Press ENTER for all prompts >>
     cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
     chmod 0600 ~/.ssh/authorized_keys
  • Verify by performing ssh localhost

  • Disable IPV6 by editing /etc/sysctl.conf with the followings:

     net.ipv6.conf.all.disable_ipv6 = 1
     net.ipv6.conf.default.disable_ipv6 = 1
     net.ipv6.conf.lo.disable_ipv6 = 1
  • Check that using cat /proc/sys/net/ipv6/conf/all/disable_ipv6

    (should return 1)

Installation and Configuration:

  • Download required version of Hadoop from Apache archives using wget command.

     cd /opt/hadoop/
     wget http:/addresstoarchive/hadoop-2.x.x/xxxxx.gz
     tar -xvf hadoop-2.x.x.gz
     mv hadoop-2.x.x.gz hadoop 
     ln -s hadoop-2.x.x.gz hadoop
     chown -R hadoop:hadoop hadoop
  • Update .bashrc/.kshrc based on your shell with below environment variables

      export HADOOP_PREFIX=/opt/hadoop/hadoop
      export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
      export JAVA_HOME=/java/home/path
  • In $HADOOP_HOME/etc/hadoop directory edit below files

    • core-site.xml

    • mapred-site.xml

      Create mapred-site.xml from its template

      cp mapred-site.xml.template mapred-site.xml

    • yarn-site.xml

    • hdfs-site.xml


    Create the parent folder to store the hadoop data

    mkdir -p /home/hadoop/hdfs
  • Format NameNode (cleans up the directory and creates necessary meta files)

    hdfs namenode -format
  • Start all services:

    start-dfs.sh && start-yarn.sh
    mr-jobhistory-server.sh start historyserver

Instead use start-all.sh (deprecated).

  • Check all running java processes

  • Namenode Web Interface: http://localhost:50070/

  • Resource manager Web Interface: http://localhost:8088/

  • To stop daemons(services):

    stop-dfs.sh && stop-yarn.sh
    mr-jobhistory-daemon.sh stop historyserver

Instead use stop-all.sh (deprecated).