This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

CREATE DATABASE Statement

In Impala, a database is both:

  • A logical construct for grouping together related tables within their own namespace. You might use a separate database for each application, set of related tables, or round of experimentation.
  • A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and data files are all located under this directory. You can back it up, measure space usage, or remove it (if it is empty) with a DROP DATABASE statement.

Syntax:

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name[COMMENT 'database_comment']
  [LOCATION hdfs_path];

Statement type: DDL

Databases, DROP DATABASE Statement, USE Statement

Usage notes:

A database is physically represented as a directory in HDFS, with a filename extension .db, under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you. All databases and their associated directories are top-level objects, with no physical or logical nesting.

After creating a database, to make it the current database within an impala-shell session, use the USE statement. You can refer to tables in the current database without prepending any qualifier to their names.

When you first connect to Impala through impala-shell, the database you start in (before issuing any CREATE DATABASE or USE statements) is named default.

Impala includes another predefined database, _impala_builtins, that serves as the location for the built-in functions. To see the built-in functions, use a statement like the following:
show functions in _impala_builtins;
show functions in _impala_builtins like '*substring*';

After creating a database, your impala-shell session or another impala-shell connected to the same node can immediately access that database. To access the database through the Impala daemon on a different node, issue the INVALIDATE METADATA statement first while connected to that other node.

If you connect to different Impala nodes within an impala-shell session for load-balancing purposes, you can enable the SYNC_DDL query option to make each DDL statement wait before returning, until the new or changed metadata has been received by all the Impala nodes. See SYNC_DDL for details.

Examples:

create database first;
use first;
create table t1 (x int);

create database second;
use second;
-- Each database has its own namespace for tables.
-- You can reuse the same table names in each database.
create table t1 (s string);

create database temp;
-- You do not have to USE a database after creating it.
-- Just qualify the table name with the name of the database.
create table temp.t2 (x int, y int);
use database temp;
create table t3 (s string);
-- You cannot drop a database while it is selected by the USE statement.
drop database temp;
ERROR: AnalysisException: Cannot drop current default database: temp
-- The always-available database 'default' is a convenient one to USE.
use default;
-- Dropping the database is a fast way to drop all the tables within it.
drop database temp;

Cancellation: Cannot be cancelled.

Page generated September 3, 2015.