hadoop What is HDFS? Finding files in HDFS


To find a file in the Hadoop Distributed file system:

hdfs dfs -ls -R / | grep [search_term]

In the above command,

-ls is for listing files

-R is for recursive(iterate through sub directories)

/ means from the root directory

| to pipe the output of first command to the second

grep command to extract matching strings

[search_term] file name to be searched for in the list of all files in the hadoop file system.

Alternatively the below command can also be used find and also apply some expressions:

hadoop fs -find / -name test -print

Finds all files that match the specified expression and applies selected actions to them. If no path is specified then defaults to the current working directory. If no expression is specified then defaults to -print.

The following primary expressions are recognised:

  • name pattern
  • iname pattern

Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive.

  • print
  • print0Always

Evaluates to true. Causes the current pathname to be written to standard output. If the -print0 expression is used then an ASCII NULL character is appended.

The following operators are recognised:

expression -a expression
expression -and expression
expression expression