hiveCreate Database and Table Statement


Syntax

  • CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name

    [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)]
    [AS select_statement];

  • CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path];

  • data_type : primitive_type,array_type,map_type,struct_type,union_type

  • primitive_type: TINYINT, SMALLINT, INT , BIGINT, BOOLEAN, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP, DECIMAL, DECIMAL(precision, scale), DATE, VARCHAR, CHAR

  • array_type: ARRAY < data_type >

  • map_type: MAP < primitive_type, data_type >

  • struct_type: STRUCT < col_name : data_type [COMMENT col_comment], ...>

  • union_type: UNIONTYPE < data_type, data_type, ... >

  • row_format: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char]
    , SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

  • file_format: : SEQUENCEFILE , TEXTFILE , RCFILE , ORC , PARQUET , AVRO , INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

  • CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];

Remarks

When working with tables and databases in HIVE. Below points can be usefull.

  • We can switch database using use database; command
  • To know the current working database we can get using SELECT current_database()
  • To see the DDL used for create table statement we can use SHOW CREATE TABLE tablename
  • To see all columns of table use DESCRIBE tablename to show extended details like location serde used and others DESCRIBE FORMATTED tablename. DESCRIBE can also be abbrevated as DESC.