Example
Hadoop is an open-source software framework for storage and large-scale processing of data-sets in a distributed computing environment.
It is sponsored by Apache Software Foundation.
It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
History
- Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
- Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant.
- It was originally developed to support distribution for the search engine project.
Major modules of hadoop
Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Hadoop File System
Basic Features
Highly fault-tolerant.
High throughput.
Suitable for applications with large data sets.
Can be built out of commodity hardware.
Namenode and Datanodes
Master/slave architecture.
HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients.
The DataNodes manage storage attached to the nodes that they run on.
HDFS exposes a file system namespace and allows user data to be stored in files.
A file is split into one or more blocks and set of blocks are stored in DataNodes.
DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode.
HDFS is designed to store very large files across machines in a large cluster.
Each file is a sequence of blocks.
All blocks in the file except the last are of the same size.
Blocks are replicated for fault tolerance.
The Namenode receives a Heartbeat and a BlockReport from each DataNode in the cluster.
BlockReport contains all the blocks on a Datanode.
Hadoop Shell Commands
Common commands used:-
ls Usage: hadoop fs –ls Path(dir/file path to list).
Cat Usage: hadoop fs -cat PathOfFileToView
Link for hadoop shell commands:- https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html