Tutorial by Examples | RIP Tutorial

SEQUENCEFILE

Store data in SEQUENCEFILE if the data needs to be compressed. You can import text files compressed with Gzip or Bzip2 directly into a table stored as TextFile. The compression will be detected automatically and the file will be decompressed on-the-fly during query execution. CREATE TABLE raw_seque...

hive • File formats in HIVE

ORC

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data. ORC file can contain lightweight indexes a...

hive • File formats in HIVE

PARQUET

Parquet columnar storage format in Hive 0.13.0 and later. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spac...

hive • File formats in HIVE

AVRO

Avro files are been supported in Hive 0.14.0 and later. Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop,...

hive • File formats in HIVE

Text File

TextFile is the default file format, unless the configuration parameter hive.default.fileformat has a different setting. We can create a table on hive using the field names in our delimited text file. Lets say for example, our csv file contains three fields (id, name, salary) and we want to create ...

hive • File formats in HIVE