Cloudera Manager and Managed Service Data Stores
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations, Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
Although the embedded database is useful for getting started quickly, you can also use your own PostgreSQL, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases.
- Required Databases
- Setting up the Cloudera Manager Server Database
- External Databases for Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server
- External Databases for Hue and Oozie
For information about planning, managing, and backing up Cloudera Manager data stores, see Storage Space Planning for Cloudera Manager.
- Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (<100 MB) is the most important to back up.Important
:When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database.
- Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
- Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
- Hive Metastore Server - Contains Hive metadata. Relatively small.
- Sentry Server - Contains authorization metadata. Relatively small.
- Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
- Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.
- Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you can optionally choose to create external databases for Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
- Path B and Path C require you to create databases for the Cloudera Manager Server,Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
Using an external database requires more input and intervention as you install databases or gather information about existing ones. These paths also provide greater flexibility in choosing database types and configurations.
Cloudera Manager supports deploying different types of databases in a single environment, but doing so can create unexpected complications. Cloudera recommends choosing one supported database provider for all of the Cloudera databases.
In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the installation or upgrade process. After completing the installation or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. Although the database location is changeable, before beginning an installation or upgrade, you should decide which hosts to use. The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.
You can install the database and services on different hosts. Separating databases from services is more likely in larger deployments and in cases where more sophisticated database administrators choose such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that are managed separately by Oracle database administrators.
Setting up the Cloudera Manager Server Database
The Cloudera Manager Server database stores information about service and host configurations. You can use an embedded PostgreSQL database or an external database.
Installing and Starting the Cloudera Manager Server Embedded Database
- Install the embedded PostgreSQL database packages:
OS Command Red Hat-compatible, if you have a yum repo configured
$ sudo yum install cloudera-manager-server-db-2
Red Hat-compatible, if you're transferring RPMs manually
sudo yum --nogpgcheck localinstall cloudera-manager-server-db-2.noarch.rpm
$ sudo zypper install cloudera-manager-server-db-2
Ubuntu or Debian
$ sudo apt-get install cloudera-manager-server-db-2
- Start the PostgreSQL database:
$ sudo service cloudera-scm-server-db start
Preparing a Cloudera Manager Server External Database
- Run the
script on the host where the Cloudera Manager Server package is
- Installer or package install
- Tarball install
- Creating the Cloudera Manager Server database configuration file.
- Creating a database for the Cloudera Manager Server to use. This is optional and is only completed if options are specified.
- Setting up a user account for the Cloudera Manager Server. This is optional and is only completed if options are specified.
- Installer or package install
- Remove the embedded PostgreSQL properties file if it exists:
- Installer or package install
- Tarball install
- Installer or package install
scm_prepare_database.sh database-type [options] database-name username password
|database-type||One of the supported database types:
|database-name||The name of the Cloudera Manager Server database to create or use.|
|username||The username for the Cloudera Manager Server database to create or use.|
|password||The password for the Cloudera Manager Server database to create or use. If you do not specify the password on the command line, the script prompts you to enter it.|
|-h or --host||The IP address or hostname of the host where the database is installed. The default is to use the local host.|
|-P or --port||The port number to use to connect to the database. The default port is 3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a remote connection only.|
|-u or --user||The admin username for the database application. For -u, no space occurs between the option and the provided value. If this option is supplied, the script creates a user and database for the Cloudera Manager Server; otherwise, it uses the user and database you created previously.|
|-p or --password||The admin password for the database application. The default is no password. For -p, no space occurs between the option and the provided value.|
|--scm-host||The hostname where the Cloudera Manager Server is installed. Omit if the Cloudera Manager server and the database are installed on the same host.|
|--config-path||The path to the Cloudera Manager Server configuration files. The default is /etc/cloudera-scm-server.|
|--schema-path||The path to the Cloudera Manager schema files. The default is /usr/share/cmf/schema (the location of the script).|
|-f||The script does not stop if an error occurs.|
|-? or --help||Display help.|
Example 1: Running the script when MySQL is installed on another host
- At the myhost1 MySQL prompt, create a temporary user who can connect from
mysql> grant all on *.* to 'temp'@'%' identified by 'temp' with grant option; Query OK, 0 rows affected (0.00 sec)
- On the Cloudera Manager Server host (myhost2), run the script:
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm Looking for MySQL binary Looking for schema files in /usr/share/cmf/schema Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
- On myhost1, delete the temporary user:
mysql> drop user 'temp'@'%'; Query OK, 0 rows affected (0.00 sec)
Example 2: Running the script to configure Oracle
[root@rhel55-6 ~]# /usr/share/cmf/schema/scm_prepare_database.sh -h cm-oracle.example.com oracle orcl sample_user sample_pass Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
Example 3: Running the script when PostgreSQL is co-located with the Cloudera Manager ServerThis example assumes that you have already created the Cloudera Management Server database and database user, naming both scm.
$ /usr/share/cmf/schema/scm_prepare_database.sh postgresql scm scm scm
External Databases for Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server
External Databases for Hue and Oozie
Hue and Oozie are automatically configured with databases, but you can configure these services to use external databases after Cloudera Manager is installed.
Configuring an External Database for HueBy default Hue is configured to use the SQLite database. To use an external database for Hue, see Using an External Database for Hue.
Configuring an External Database for OozieBy default Oozie is configured to use the Derby database. To use an external database for Oozie, see Using an External Database for Oozie.
|<< Installing Cloudera Manager, CDH, and Managed Services||Embedded PostgreSQL Database >>|