Apache Sqoop Known Issues

MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop

CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.

Bug: None

Affected Versions: MySQL JDBC 5.1.17, 5.1.4, 5.3.0

Resolution: Use workaround.

Workaround: Install version 5.1.31 of the JDBC driver as detailed in Installing the JDBC Drivers for Sqoop 1 (Sqoop 1) or Configuring Sqoop 2 (Sqoop 2).

MS SQL Server "integratedSecurity" option unavailable in Sqoop

The integratedSecurity option is not available in the Sqoop CLI.

Bug: None

Resolution: None

Workaround: None

Sqoop 1

Sqoop1 Metastore job tool ignores overrides

Sqoop job command option overrides are being ignored due to SQOOP-2896. The exec action should let you override arguments in a saved job by passing options after --, as in:
sqoop job --exec job -- options_to_override>
However, the values in options_to_override are not being passed to the saved job. Sqoop jobs that use overrides can break or execute incorrectly, as shown in the example below.
sqoop job --create my_eval_query –- eval --connect jdbc:mysql://fqdn.example.com:3306/db --username user --password password --query "select * from t1"
This command executes the job defined in my_eval_query and returns the rows in the table:
sqoop job --exec my_eval_query
c1  c2           c3
1   2017-02-17   row data 1
2   2017-02-17   row data 2
3   2017-02-17   row data 3
In this next run below, the override query is ignored. The job uses the original parameters stored for the job and returns the same rows as before rather than the single row where column c1 = 2:
sqoop job --exec my_eval_query --query "select * from t1 where c1 = 2"
c1  c2           c3
1   2017-02-17   row data 1
2   2017-02-17   row data 2 <--- Only this row should be returned
3   2017-02-17   row data 3

Affected releases: CDH 5.8.0, 5.8.1, 5.8.3, 5.8.4; CDH 5.9.0, 5.9.1; CDH 5.10.0

Bug: SQOOP-2896

This issue has been resolved in CDH 5.8.5, CDH 5.9.2, CDH 5.10.1, CDH 5.11.0 (and higher) releases.

Sqoop1 (doc import + --as-parquetfile) limitation with KMS/KTS Encryption at Rest

Due to a limitation with Kite SDK, it is not possible to use (sqoop import --as-parquetfile) with KMS/KTS Encryption zones. See the following example.
sqoop import --connect jdbc:db2://djaxludb1001:61035/DDBAT003 --username=dh810202 --P --target-dir /data/hive_scratch/ASDISBURSEMENT --delete-target-dir -m1 --query "select disbursementnumber,disbursementdate,xmldata FROM DB2dba.ASDISBURSEMENT where DISBURSEMENTNUMBER = 2011113210000115311 AND \$CONDITIONS" -hive-import --hive-database adminserver -hive-table asdisbursement_dave --map-column-java XMLDATA=String --as-parquetfile

16/12/05 12:23:46 INFO mapreduce.Job: map 100% reduce 0%
16/12/05 12:23:46 INFO mapreduce.Job: Job job_1480530522947_0096 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://AJAX01-ns/tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096 to hdfs://AJAX01-ns/data/RetiredApps/INS/AdminServer/asdisbursement_dave
<SNIP>
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096/5ddcac42-5d69-4e46-88c2-17bbedac4858.parquet can't be moved into an encryption zone.

Workaround: Use an alternate data file type, for example text or avro.

Doc import as Parquet files may result in out-of-memory errors

Out-of-memory (OOM) errors can be caused in the following two cases:
  • With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
  • When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows

Bug: PARQUET-99

Workaround: None, other than restructuring the data.

Hive, Pig, and Sqoop 1 fail in MRv1 tarball due to incorrect HADOOP_MAPRED_HOME setting

This issue affects tarball installations only.

Bug: None

Resolution: Use workaround.

Workaround: For MRv1 only, manually edit as follows:
  • Change the path for the Hadoop MapReduce home setting in /etc/default/hadoop from:
    export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

    to:

    export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce
  • Remove /usr/lib/hadoop-mapreduce from the HADOOP_CLASSPATH.

Sqoop import into Hive Causes a Null Pointer Exception (NPE)

Bug: None

Workaround: Import the data into HDFS via Sqoop first and then import it into Hive from HDFS.

Sqoop 2

Sqoop 2 client cannot be used with a different version of the Sqoop 2 server

The Sqoop 2 client and server must be running the same CDH version.

Bug: None

Workaround: Make sure all Sqoop 2 components are running the same version of CDH.

Sqoop 2 upgrade may fail if any job's source and destination links point to the same connector

For example, the links for the job shown in the following output both point to generic-jdbc-connector:

sqoop:000> show job --all
1 job(s) to show:
Job with id 1 and name job1 (Enabled: true, Created by null at 5/13/15 3:05 PM, Updated by null at 5/13/15 6:04 PM)
  Throttling resources
    Extractors:
    Loaders:
From link: 1
  From database configuration
    Schema name: schema1
    Table name: tab1
    Table SQL statement:
    Table column names: col1
    Partition column name:
    Null value allowed for the partition column: false
    Boundary query:
  Incremental read
    Check column:
    Last value:
To link: 2
  To database configuration
    Schema name: schema2
    Table name: tab2
    Table SQL statement:
    Table column names: col2
    Stage table name:
    Should clear stage table:

sqoop:000> show link --all
2 link(s) to show:
link with id 1 and name try1 (Enabled: true, Created by null at 5/13/15 2:59 PM, Updated by null at 5/13/15 5:47 PM)
Using Connector generic-jdbc-connector with id 2
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://mysql.server/database
    Username: nvaidya
    Password:
    JDBC Connection Properties:
link with id 2 and name try2 (Enabled: true, Created by null at 5/13/15 3:01 PM, Updated by null at 5/13/15 5:47 PM)
Using Connector generic-jdbc-connector with id 2
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://mysql.server/database
    Username: nvaidya
    Password:
    JDBC Connection Properties:

Bug: None

Workaround: Before upgrading, make sure no jobs have source and destination links that point to the same connector.