Tutorial by Examples | RIP Tutorial

HDFS - Hadoop Distributed File System

Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS, MapReduce, and YARN form the core of Apache™ Hadoop®. HDFS is designed to be highly fault-tolerant, which is achieve...

hadoop • What is HDFS?

Finding files in HDFS

To find a file in the Hadoop Distributed file system: hdfs dfs -ls -R / | grep [search_term] In the above command, -ls is for listing files -R is for recursive(iterate through sub directories) / means from the root directory | to pipe the output of first command to the second grep command t...

hadoop • What is HDFS?

Blocks and Splits HDFS

Block Size and Blocks in HDFS : HDFS has the concept of storing data in blocks whenever a file is loaded. Blocks are the physical partitions of data in HDFS ( or in any other filesystem, for that matter ). Whenever a file is loaded onto the HDFS, it is splitted physically (yes, the file is divi...

hadoop • What is HDFS?