Apache Sqoop Known Issues
MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop
CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.
Affected Versions: MySQL JDBC 5.1.17, 5.1.4, 5.3.0
Resolution: Use workaround.
MS SQL Server "integratedSecurity" option unavailable in Sqoop
The integratedSecurity option is not available in the Sqoop CLI.
Sqoop1 Metastore job tool ignores overrides
sqoop job --exec job -- options_to_override>
sqoop job --create my_eval_query –- eval --connect jdbc:mysql://fqdn.example.com:3306/db --username user --password password --query "select * from t1"
sqoop job --exec my_eval_query c1 c2 c3 1 2017-02-17 row data 1 2 2017-02-17 row data 2 3 2017-02-17 row data 3
sqoop job --exec my_eval_query --query "select * from t1 where c1 = 2" c1 c2 c3 1 2017-02-17 row data 1 2 2017-02-17 row data 2 <--- Only this row should be returned 3 2017-02-17 row data 3
Affected releases: CDH 5.8.0, 5.8.1, 5.8.3, 5.8.4; CDH 5.9.0, 5.9.1; CDH 5.10.0
This issue has been resolved in CDH 5.8.5, CDH 5.9.2, CDH 5.10.1, CDH 5.11.0 (and higher) releases.
Sqoop1 (doc import + --as-parquetfile) limitation with KMS/KTS Encryption at Rest
sqoop import --connect jdbc:db2://djaxludb1001:61035/DDBAT003 --username=dh810202 --P --target-dir /data/hive_scratch/ASDISBURSEMENT --delete-target-dir -m1 --query "select disbursementnumber,disbursementdate,xmldata FROM DB2dba.ASDISBURSEMENT where DISBURSEMENTNUMBER = 2011113210000115311 AND \$CONDITIONS" -hive-import --hive-database adminserver -hive-table asdisbursement_dave --map-column-java XMLDATA=String --as-parquetfile 16/12/05 12:23:46 INFO mapreduce.Job: map 100% reduce 0% 16/12/05 12:23:46 INFO mapreduce.Job: Job job_1480530522947_0096 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://AJAX01-ns/tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096 to hdfs://AJAX01-ns/data/RetiredApps/INS/AdminServer/asdisbursement_dave <SNIP> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096/5ddcac42-5d69-4e46-88c2-17bbedac4858.parquet can't be moved into an encryption zone.
Workaround: Use an alternate data file type, for example text or avro.
Doc import as Parquet files may result in out-of-memory errors
- With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
- When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows
Workaround: None, other than restructuring the data.
Hive, Pig, and Sqoop 1 fail in MRv1 tarball due to incorrect HADOOP_MAPRED_HOME setting
This issue affects tarball installations only.
Resolution: Use workaround.
- Change the path for the Hadoop MapReduce home setting in /etc/default/hadoop from:
- Remove /usr/lib/hadoop-mapreduce from the HADOOP_CLASSPATH.
Sqoop import into Hive Causes a Null Pointer Exception (NPE)
Workaround: Import the data into HDFS via Sqoop first and then import it into Hive from HDFS.
Sqoop 2 client cannot be used with a different version of the Sqoop 2 server
The Sqoop 2 client and server must be running the same CDH version.
Workaround: Make sure all Sqoop 2 components are running the same version of CDH.
Sqoop 2 upgrade may fail if any job's source and destination links point to the same connector
For example, the links for the job shown in the following output both point to generic-jdbc-connector:
sqoop:000> show job --all 1 job(s) to show: Job with id 1 and name job1 (Enabled: true, Created by null at 5/13/15 3:05 PM, Updated by null at 5/13/15 6:04 PM) Throttling resources Extractors: Loaders: From link: 1 From database configuration Schema name: schema1 Table name: tab1 Table SQL statement: Table column names: col1 Partition column name: Null value allowed for the partition column: false Boundary query: Incremental read Check column: Last value: To link: 2 To database configuration Schema name: schema2 Table name: tab2 Table SQL statement: Table column names: col2 Stage table name: Should clear stage table: sqoop:000> show link --all 2 link(s) to show: link with id 1 and name try1 (Enabled: true, Created by null at 5/13/15 2:59 PM, Updated by null at 5/13/15 5:47 PM) Using Connector generic-jdbc-connector with id 2 Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://mysql.server/database Username: nvaidya Password: JDBC Connection Properties: link with id 2 and name try2 (Enabled: true, Created by null at 5/13/15 3:01 PM, Updated by null at 5/13/15 5:47 PM) Using Connector generic-jdbc-connector with id 2 Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://mysql.server/database Username: nvaidya Password: JDBC Connection Properties:
Workaround: Before upgrading, make sure no jobs have source and destination links that point to the same connector.