Installing and Configuring an External PostgreSQL Database
Use the following instructions to install PostgreSQL and set up databases on appropriate hosts in your cluster. You should create databases for the monitoring services that are features of Cloudera Manager, as well as a database for the Hive Metastore Server.
It's useful to set a password for the root user of PostgreSQL. Note the host name and port number where you install PostgreSQL because you will need to specify them when you install the JDBC connector to PostgreSQL in a later step. Note that PostgreSQL does not have an accepted default port. You must determine the port used in your environment. You will also need to create a PostgreSQL database and user account for Cloudera Manager to use to store data. See your PostgreSQL documentation for more information about installation and configuration.
$ sudo yum install postgresql-server
$ sudo zypper install postgresql-server
$ sudo apt-get install postgresql
Configuring Your Systems to Support PostgreSQL
You must configure the PostgreSQL database to run as expected. This involves:
- Configuring PostgreSQL to accept network connections.
- Initializing the database to work with Cloudera Manager.
- Configuring the operating system to start PostgreSQL.
Configuring PostgreSQL to accept network connections
By default, PostgreSQL only accepts connections on the loopback interface. Remember to reconfigure PostgreSQL to accept connections from the Fully Qualified Domain Name (FQDN) of the machines hosting the management roles. If you do not make these changes, the management processes will not be able to connect to and use the database on which they depend.
Initializing and configuring the external PostgreSQL database
- Prepare the external PostgreSQL database for use with the Cloudera Manager Server. This step may not be required for latter versions of PostgreSQL (8.4 and later) where the install command also initializes the database. See the PostgreSQL documentation for more details on this.
$ sudo service postgresql initdb
- Enable MD5 authentication. Edit pg_hba.conf, which is usually found in /var/lib/pgsql/data or /etc/postgresql/8.4/main. Add the following line:
host all all 127.0.0.1/32 md5
Add this line before the second occurrence of another line in the configuration file that references ident authentication. You can modify the contents of the previous line to support different configurations. For example, if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf to include:
listen_addresses = '*'
- Start the PostgreSQL database:
$ sudo service postgresql start
- Configure the PostgreSQL server to start at boot.
OS Command RHEL
$ sudo /sbin/chkconfig postgresql on $ sudo /sbin/chkconfig --list postgresql postgresql 0:off 1:off 2:on 3:on 4:on 5:on 6:off
$ sudo chkconfig --add postgresql
Ubuntu or Debian
$ sudo chkconfig postgresql on
Creating the PostgreSQL Databases for Cloudera Manager
The next step involves creating databases and user accounts for all database-backed services in Cloudera Manager. Create databases for each of the following features that are part of the Cloudera Management Services:
- Activity Monitor
- Service Monitor
- Report Manager
- Host Monitor
- Cloudera Navigator (optional -- Cloudera Navigator is a separately-purchased add-on to Cloudera Enterprise)
In addition, add a database for each Hive Metastore server.
You can create these databases on the host where the Cloudera Manager Server will run, or on any other nodes in the cluster. For performance reasons, you should typically install each database on the host on which the service runs, as determined by the roles you will assign during installation or upgrade. In larger deployments or in cases where database administrators (DBAs) are managing the databases the services will use, databases may be separated from services, but do not undertake such an implementation lightly.
Note the values you enter for database names, user names, and passwords. The Cloudera Manager installation wizard requires this information to correctly connect to these databases.
The database must be configured to support UTF-8 character set encoding. The sample commands below include the required options to enable UTF-8 support.
Creating the PostgreSQL Databases for Cloudera Manager
- Connect to PostgreSQL.
$ sudo -u postgres psql
- Create a database for the Activity Monitor feature and assign permissions to a database user. The database name, user name, and password can be anything you want. The examples shown here match the default names provided in the Cloudera Manager Hive configuration settings.
postgres=# CREATE ROLE amon LOGIN PASSWORD 'amon_password'; postgres=# CREATE DATABASE amon OWNER amon ENCODING 'UTF8';
- Create a database for the Service Monitor feature and assign permissions to a database user. The database name, user name, and password can be anything you want.
postgres=# CREATE ROLE smon LOGIN PASSWORD 'smon_password'; postgres=# CREATE DATABASE smon OWNER smon ENCODING 'UTF8';
- Create a database for the Report Manager feature and assign permissions to a database user. The database name, user name, and password can be anything you want.
postgres=# CREATE ROLE rman LOGIN PASSWORD 'rman_password'; postgres=# CREATE DATABASE rman OWNER rman ENCODING 'UTF8';
- Create a database for the Host Monitor feature and assign permissions to a database user. The database name, user name, and password can be anything you want.
postgres=# CREATE ROLE hmon LOGIN PASSWORD 'hmon_password'; postgres=# CREATE DATABASE hmon OWNER hmon ENCODING 'UTF8';
- (Optional) Create a database for the Cloudera Navigator feature and assign permissions to a database user. For example:
postgres=# CREATE ROLE nav LOGIN PASSWORD 'nav_password'; postgres=# CREATE DATABASE nav OWNER nav ENCODING 'UTF8';
Creating the database for the Hive metastore
Create a separate metastore for each Hive service, if you have more than one.
- Create a database for the Hive Metastore Server and assign permissions to a database user. For example.
postgres=# CREATE ROLE hive LOGIN PASSWORD 'hive_password'; postgres=# CREATE DATABASE metastore OWNER hive ENCODING 'UTF8';
For PostgreSQL 8.2.23 or later, also do:
postgres=# ALTER DATABASE metastore SET standard_conforming_strings = off;
Configuring PostgreSQL Settings
There are several settings you should update to ensure your system performs as expected. Update these settings in the /var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings vary based on cluster size and resources.
Large clusters may contain up to 1000 hosts. For large clusters consider the following suggestions as a starting point for settings.
- max_connection: For large clusters, each database is typically hosted on a different machine. The general rule is to allow each database on a host 100 maximum connections and then add 50 extra connections. As a result, in the normal case for large clusters, configure each of the five machines that hosts a single database for 150 connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.
- shared_buffers: 1024MB. Note that this requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.
- wal_buffers: 16MB. This value is derived from the shared_buffers value. Setting wal_buffers to be approximately 3% of shared_buffers up to a maximum of approximately 16MB works well in most case.
- checkpoint_segments: 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.
- checkpoint_completion_target: 0.9. This setting is only available in PostgreSQL 8.3 and later. These versions are highly recommended.
Small to Mid-sized Clusters
For small to mid-sized clusters, consider the following suggestions as a starting point for settings. If resources are especially limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each machine's resource utilization. For example, if Cloudera Manager is running on the same machine as other roles, the following values may be acceptable:
- shared_buffers: 256MB
- wal_buffers: 8MB
- checkpoint_segments: 16
- checkpoint_completion_target: 0.9
Configuration Settings for PostgreSQL 8.1
Cloudera recommends using PostgreSQL 8.4 or later. While more recent versions provide better results, earlier versions may be used. For example, Cloudera supports PostgreSQL 8.1, which is bundled with some older Linux distributions. If you use PostgreSQL 8.1, settings such as checkpoint_completion_target are not available. Consequently, consider using the following recommended settings:
- shared_buffers: 131072
- wal_buffers: 4096
- checkpoint_segments: 256
Note that because PostgreSQL 8.1 does not support entering parameters in MB, the preceding values are provided in buffers or segments. For example, each buffer is 8KB, so 131072 is equivalent to 1024 MB.
After updating database settings, you must restart PostgreSQL for the new settings to take effect.
$ sudo service postgresql restart
Backing up the Databases
- Cloudera Manager database: Contains all the information about what services you have configured, their role assignments, all configuration history, commands, users, and running processes. This is a relatively small database (<100MB), and is the most important to back up.
- Activity Monitor database: Contains information about past activities. In large clusters, this database can grow large.
- Service Monitor database: Contains monitoring information about daemons. In large clusters, this database can grow large.
- Report Manager database: Keeps track of disk utilization over time. Medium-sized.
- Host Monitor database: Contains information about host status. In large clusters, this database can grow large.
- Cloudera Navigator database: Contains auditing information. In large clusters, this database can grow large.
- Hive Metastore database: Contains Hive metadata. Relatively small.
Backing Up the PostgreSQL Database
It's important that you periodically back up the external or embedded PostgreSQL database that Cloudera Manager uses to store configuration information.
To back up the embedded PostgreSQL database, you can simply backup the /var/lib/cloudera-scm-server-db directory.
To back up the external PostgreSQL database, you can use the pg_dump utility.
- Log in to the host where the Cloudera Manager Server is installed.
- Run the following command as root:
cat /etc/cloudera-scm-server/db.properties. The db.properties file contains: # Auto-generated by scm_prepare_database.sh # Mon Jul 27 22:36:36 PDT 2011 com.cloudera.cmf.db.type=postgresql com.cloudera.cmf.db.host=localhost:7432 com.cloudera.cmf.db.name=scm com.cloudera.cmf.db.user=scm com.cloudera.cmf.db.password=NnYfWIjlbk
- Run the following command as root using the parameters from the preceding step:
# pg_dump -h localhost -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d)
- Enter the password specified for the com.cloudera.cmf.db.password property on the last line of the db.properties file. If you are using the embedded database, Cloudera Manager generated the password for you during installation. If you are using an external database, enter the appropriate information for your database.
For more information about using the pg_dump utility, see the pg_dump reference.