hadoop Getting started with hadoop Hadoop overview and HDFS


enter image description here

    Hadoop is an open-source software framework for storage and large-scale processing of data-sets in a distributed computing environment. It is sponsored by Apache Software Foundation. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

  • Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
  • Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant.
  •  It was originally developed to support distribution for the search engine project.
Major modules of hadoop

    Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data. Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Hadoop File System  Basic Features

    Highly fault-tolerant. High throughput. Suitable for applications with large data sets. Can be built out of commodity hardware.
Namenode and Datanodes

    Master/slave architecture. HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients. The DataNodes manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. A file is split into one or more blocks and set of blocks are stored in DataNodes. DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode.

enter image description here

    HDFS is designed to store very large files across machines in a large cluster. Each file is a sequence of blocks. All blocks in the file except the last are of the same size. Blocks are replicated for fault tolerance. The Namenode receives a Heartbeat and a BlockReport from each DataNode in the cluster. BlockReport contains all the blocks on a Datanode.
Hadoop Shell Commands

    Common commands used:-
      ls Usage: hadoop fs –ls Path(dir/file path to list). Cat Usage: hadoop fs -cat PathOfFileToView

enter image description here

Link for hadoop shell commands:- https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html