Installing and Configuring Databases
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. Cloudera Manager supports using a variety of databases to store required information. To facilitate rapid completion of simple installations, the Cloudera Manager can install and configure a PostgreSQL database as part of the broader Cloudera Manager installation process. This automatically installed database is sometimes referred to as an embedded PostgreSQL database. While the embedded database is a useful option for getting started quickly, Cloudera Manager also allows you to use other databases. You can opt to use your own PostgreSQL database or MySQL or Oracle databases.
If you plan to use the embedded database provided during the Cloudera Manager installation for all databases, you can skip ahead to Installation Path A - Automated Installation by Cloudera Manager. To learn more about database options or if you are unsure whether or not using the embedded database is right for your environment, continue with the following sections:
What Databases Must Be Installed
The Cloudera Manager Server and the server's Activity Monitor, Service Monitor, Report Manager, and Host Monitor all require databases, as does Cloudera Navigator, Hive Metastore, and Impala Catalog Daemon. Cloudera Manager does support deploying different types of databases in a single environment, but doing so may create unexpected complications. Cloudera recommends choosing one of the three database providers to use for all of the Cloudera Manager databases.
Cloudera provides two install paths:
- Path A automatically installs embedded PostgreSQL databases to meet the requirements of the services. This path reduces the number of installation tasks you must complete, as well as the number of choices to make.
- Path B requires you have databases in your environment for use by Cloudera Manager, Cloudera Management Services, and Hive metastore. This path requires more input and intervention as you either install databases or gather information about existing databases. This path also provides greater flexibility in choosing database types and configurations.
A service works with a database. In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You will assign the Service Monitor, Activity Monitor, Report Manager, and Host Monitor roles in the Cloudera Manager wizard during the install or upgrade process. After completing the install or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. While it is true that database location is changeable, before beginning an installation or upgrade, you should decide which hosts you will use. Note that the JDBC connector for your database must be installed on the hosts where you assign the Service Monitor, Activity Monitor, Report Manager, and Host Monitor roles. Installing JDBC connectors is described later in this guide.
It is possible to install the database and services on different hosts. Separating databases from services is more likely to occur in larger deployments and in cases where more sophisticated database administrators actively choose to establish such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that will be separately managed by Oracle database administrators (DBAs).
The table that follows provides a summary; details are in the sections that follow.
Install or Upgrade Path
Install Supported Database For
Typically Install Databases on Systems That Will Host
No installations required. Automated installation automatically creates embedded PostgreSQL databases for all Cloudera Manager and all services.
No manual installation required.
The Cloudera Manager Server configuration and for Activity Monitor, Service Monitor, Report Manager, Host Monitor, Hive Metastore, and Cloudera Navigator.
The Cloudera Manager Server, Activity Monitor, Service Monitor, Report Manager, Host Monitor, and Cloudera Navigator roles, and the Hive Metastore. Alternately, you may install these databases on other systems, assuming those systems are accessible to the Cloudera Manager Server.
Host Monitor and Cloudera Navigator roles, and the Hive Metastore.
Host Monitor and Cloudera Navigator roles, and the Hive Metastore.
About the Cloudera Manager Server Database
This database, which is used for storing information about services' configurations, is independent of the databases used by the Activity Monitor, Service Monitor, Report Manager, Host Monitor, Cloudera Navigator, or the Hive Metastore.
|Automatic installation: Installation Path A||The wizard automatically installs, configures, and uses embedded PostgreSQL databases to store information about service configuration, as well as the Activity Monitor, Service Monitor, Report Manager, Host Monitor, Cloudera Navigator, and the Hive Metastore. This functionality is provided by the cloudera-scm-server-db package, and you can start and stop these databases using the service cloudera-scm-server-db [start|stop] command. If you are using Installation Path A, you can proceed directly to Installation Path A.|
|Manual installation: Installation Path B||You must install a supported database. This database can be installed on the machine where you install the Cloudera Manager Server or on a machine accessible to the Cloudera Manager Server. You will need to configure the connection between Cloudera Manager and the database, as is described in the documentation for Path B.|
The installation instructions for the database containing the Cloudera Manager Server's configuration are also included under Installation Path B - Installation Using Your Own Method.
- To prevent deadlocks, Cloudera Manager requires the isolation level to be set to read committed. This is the default setting for PostgreSQL and Oracle. The instructions for using a MySQL database with Cloudera Manager describe how to configure this setting. Contact your DBA to find out what your isolation level currently is set to.
- If you use a MySQL database to store information about service configuration, make sure that the InnoDB engine is configured, not the MyISAM engine. Cloudera Manager will not start if its tables are configured with the MyISAM engine. (Typically, tables revert to MyISAM if the InnoDB engine is misconfigured.) To check what engine your tables are using, run the following command from the MySQL shell:
mysql> show table status;
Using an External Database for Oozie, Hue, Hive Metastore, and Impala Catalog
Configuring an External Database for Oozie
By default, Cloudera Manager uses Derby for Oozie's database. If you want to use an external database for Oozie, you would do the configuration after Cloudera Manager is installed. For more information, see Using an External Database for Oozie in the Managing Clusters with Cloudera Manager.
Configuring an External Database for Hue
By default, Cloudera Manager uses SQLite for Hue's database. If necessary, you can configure Cloudera Manager to use an external database such as MySQL as the database for Hue - do this after Cloudera Manager is installed. For more information, see Using an External Database for Hue .
Configuring an External Database for Hive Metastore
By default, Cloudera Manager uses the embedded PostgreSQL database for the Hive Metastore. If necessary, you can configure Cloudera Manager to use an external database for the Hive metastore — do this before you install Cloudera Manager. For more information, see the instructions in Installing and Configuring an External PostgreSQL Database , Installing and Configuring a MySQL Database , or Using an Oracle Database .
Configuring an External Database for Impala Catalog Daemon
By default, Cloudera Manager uses the embedded PostgreSQL database for the Impala Catalog Daemon. If necessary, you can configure Cloudera Manager to use an external database such as MySQL or Oracle as the database for the Impala Catalog Daemon — do this before you install Cloudera Manager. For more information, see the instructions in Installing and Configuring an External PostgreSQL Database , Installing and Configuring a MySQL Database , or Using an Oracle Database .
Backing up the Databases
It's important that you periodically back up the databases that Cloudera Manager uses to store configuration, monitoring, and reporting data. Be sure to back up all of the databases you are using with Cloudera Manager:
- Cloudera Manager database: This is the most important database to back up. This database contains all the information about what services you have configured, their role assignments, all configuration history, commands, users, and running processes. This is a relatively small database, typically smaller than 100MB.
- Activity Monitor database: Contains information about past activities. In large clusters, this database can become very large.
- Service Monitor database: Contains monitoring information about daemons. In large clusters, this database can become very large.
- Report Manager database: Keeps track of disk utilization over time. This database is typically medium-sized.
- Host Manager database: Contains information about host status. The number of hosts in the cluster affects this database's size, so the database size varies, but the database is typically large in deployments with many hosts.