Apache Sqoop Known Issues

MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop

CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.

Bug: None

Resolution: Use workaround.

Workaround: Install version 5.1.31 of the JDBC driver, following directions in Installing the JDBC Drivers for Sqoop 1 (Sqoop 1) or Configuring Sqoop 2 (Sqoop 2).

MS SQL Server "integratedSecurity" option unavailable in Sqoop

The integratedSecurity option is not available in the Sqoop CLI.

Bug: None

Resolution: None

Workaround: None

Sqoop 1

Doc import as Parquet files may result in out-of-memory errors

Out-of-memory (OOM) errors can be caused in the following two cases:
  • With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
  • When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows

Bug: PARQUET-99

Workaround: None, other than restructuring the data.

Hive, Pig, and Sqoop 1 fail in MRv1 tarball installation because /usr/bin/hbase sets HADOOP_MAPRED_HOME to MR2

This problem affects tarball installations only.

Bug: None

Resolution: Use workaround.

Workaround: If you are using MRv1, edit the following line in /etc/default/hadoop from
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 
to
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce 

In addition, /usr/lib/hadoop-mapreduce must not exist in HADOOP_CLASSPATH.

Sqoop import into Hive Causes a Null Pointer Exception (NPE)

Bug: None

Workaround: Import the data into HDFS via Sqoop first and then import it into Hive from HDFS.

Sqoop 2

Sqoop 2 client cannot be used with a different version of the Sqoop 2 server

The Sqoop 2 client and server must be running the same CDH version.

Bug: None

Workaround: Make sure all Sqoop 2 components are running the same version of CDH.

Sqoop 2 upgrade may fail if any job's source and destination links point to the same connector

For example, the links for the job shown in the following output both point to generic-jdbc-connector:

sqoop:000> show job --all
1 job(s) to show:
Job with id 1 and name job1 (Enabled: true, Created by null at 5/13/15 3:05 PM, Updated by null at 5/13/15 6:04 PM)
  Throttling resources
    Extractors:
    Loaders:
From link: 1
  From database configuration
    Schema name: schema1
    Table name: tab1
    Table SQL statement:
    Table column names: col1
    Partition column name:
    Null value allowed for the partition column: false
    Boundary query:
  Incremental read
    Check column:
    Last value:
To link: 2
  To database configuration
    Schema name: schema2
    Table name: tab2
    Table SQL statement:
    Table column names: col2
    Stage table name:
    Should clear stage table:

sqoop:000> show link --all
2 link(s) to show:
link with id 1 and name try1 (Enabled: true, Created by null at 5/13/15 2:59 PM, Updated by null at 5/13/15 5:47 PM)
Using Connector generic-jdbc-connector with id 2
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://mysql.server/database
    Username: nvaidya
    Password:
    JDBC Connection Properties:
link with id 2 and name try2 (Enabled: true, Created by null at 5/13/15 3:01 PM, Updated by null at 5/13/15 5:47 PM)
Using Connector generic-jdbc-connector with id 2
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://mysql.server/database
    Username: nvaidya
    Password:
    JDBC Connection Properties:

Bug: None

Workaround: Before upgrading, make sure no jobs have source and destination links that point to the same connector.