Managing the Sqoop 1 Client

To add Sqoop 1 to your cluster, add the Sqoop 1 Client service and a Sqoop 1 gateway and deploy the client configuration:

Adding the Sqoop 1 Client

Minimum Required Role: Full Administrator

The Sqoop 1 client packages are installed by the Installation wizard. However, the client configuration is not deployed. To create a Sqoop 1 gateway and deploy the client configuration:
  1. On the Home > Status tab, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
  2. Select the Sqoop 1 Client service and click Continue.
  3. Select the services on which the new service should depend. All services must depend on the same ZooKeeper service. Click Continue.
  4. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances.

    Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.

    The following shortcuts for specifying hostname patterns are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by hostname ranges.

  5. Click Continue. The client configuration deployment command runs.
  6. Click Continue and click Finish.

Installing the JDBC Drivers for Sqoop 1

Sqoop 1 does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server. Ensure that you do not save JARs in the CDH parcel directory /opt/cloudera/parcels/CDH, because this directory is overwritten when you upgrade CDH.

The following sections show how to install the most common JDBC Drivers.

Before you begin:

Make sure the /var/lib/sqoop directory exists and has the correct ownership and permissions:
mkdir -p /var/lib/sqoop
chown sqoop:sqoop /var/lib/sqoop
chmod 755 /var/lib/sqoop

This sets permissions to drwxr-xr-x.

For JDBC drivers for Hive, Impala, Teradata, or Netezza, see the Connectors documentation.

Installing the MySQL JDBC Driver

Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html. You will need to sign up for an account if you do not already have one, and log in, before you can download it. Then copy it to the /var/lib/sqoop/ directory. For example:

$ sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/

Installing the Oracle JDBC Driver

You can download the JDBC Driver from the Oracle website, for example http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to the /var/lib/sqoop/ directory:

$ sudo cp ojdbc6.jar /var/lib/sqoop/

Installing the Microsoft SQL Server JDBC Driver

Download the Microsoft SQL Server JDBC driver from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774 and copy it to the /var/lib/sqoop/ directory. For example:

$ curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz
$ sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/

Installing the PostgreSQL JDBC Driver

Download the PostgreSQL JDBC driver from http://jdbc.postgresql.org/download.html and copy it to the /var/lib/sqoop/ directory. For example:

$ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar
$ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop/

Syntax for Configuring JDBC Connection Strings

These are the JDBC connection strings for supported databases.

MySql Connection String

Syntax:

jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:mysql://my_mysql_server_hostname:3306/my_database_name

Oracle Connection String

Syntax:

jdbc:oracle:thin:@<HOST>:<PORT>:<DATABASE_NAME>

Example:

jdbc:oracle:thin:@my_oracle_server_hostname:1521:my_database_name

PostgreSQL Connection String

Syntax:

jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name

Netezza Connection String

Syntax:

jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:netezza://my_netezza_server_hostname:5480/my_database_name

Teradata Connection String

Syntax:

jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
Example:
jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name

Setting HADOOP_MAPRED_HOME for Sqoop 1

  • For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop 1 in a YARN installation, make sure that the HADOOP_MAPRED_HOME environment variable is set correctly, as follows:
    $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce