hive Tutorial => AVRO

Example

Avro files are been supported in Hive 0.14.0 and later.

Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services.

Specification of AVRO format: https://avro.apache.org/docs/1.7.7/spec.html

CREATE TABLE kst
PARTITIONED BY (ds string)
STORED AS AVRO
TBLPROPERTIES (
  'avro.schema.url'='http://schema_provider/kst.avsc');

We can also use below syntax without using schema file.

CREATE TABLE kst (field1 string, field2 int)
PARTITIONED BY (ds string)
STORED AS AVRO;

In the examples above STORED AS AVRO clause is equivalent to:

ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

PDF - Download hive for free

Previous Next

hive

Fastest Entity Framework Extensions

Example

Got any hive Question?

hive

hive File formats in HIVE AVRO

Fastest Entity Framework Extensions

Example

Got any hive Question?