The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data. ORC file can contain lightweight indexes and bloom filters.
See: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
ORC is a recommended format for storing data within HortonWorks distribution.
CREATE TABLE tab_orc (col1 STRING,
col2 STRING,
col3 STRING)
STORED AS ORC
TBLPROPERTIES (
"orc.compress"="SNAPPY",
"orc.bloom.filter.columns"="col1",
"orc.create.index" = "true"
)
To modify a table so that new partitions of the table are stored as ORC files:
ALTER TABLE T SET FILEFORMAT ORC;
As of Hive 0.14, users can request an efficient merge of small ORC files together by issuing a CONCATENATE
command on their table or partition. The files will be merged at the stripe level without reserializatoin.
ALTER TABLE T [PARTITION partition_spec] CONCATENATE;