Cloudera Connector Powered by Teradata Release Notes

This section summarizes the high level changes and most important new features in the Cloudera Connectors for Teradata.

New Features in Cloudera Connector Powered by Teradata

The following new features are included in Cloudera Connector Powered by Teradata.

CDP compatible version of Cloudera Connector Powered by Teradata version 1.8.5.1c7 and the TDCH library to version 1.8.5.1

Cloudera Connector Powered by Teradata includes ORC support in the Sqoop-Connector-Teradata component. In this release, you can use Teradata Manager to import data from Teradata server to Hive in ORC format.

To leverage this feature, it is essential to have a compatible version of Sqoop. Ensure that the Sqoop installation meets the following minimum requirements:
  • CDP Private Cloud Base 7.1.9 and later

Extended Cloudera Data Platform (CDP) runtime compatibility also includes

  • CDP Public Cloud 7.2.9 and later
  • CDP Private Cloud Base 7.1.7 and later

Upgraded TDCH version and Teradata driver

The connector has been upgraded to use TDCH version 1.8.5.1 and Teradata driver version 20.00.00.10. This update ensures better performance, compatibility, and includes bug fixes in the connector.

Supported commands for ORC imports

Here are some examples of supported commands that utilize the Cloudera Connector Powered by Teradata to import data to Hive in ORC format:
Example 1: Import without providing Teradata driver and connection details to TeradataManager
/opt/cloudera/parcels/CDH/bin/sqoop import \
--connect ... \
--username ... \
--password ... \
--table employees \
--warehouse-dir "..." \
--hive-import \
--delete-target-dir \
--hive-overwrite \
--as-orcfile \
--external-table-dir "..." \
--hs2-url "..." \
-m 1
Example 2: Import without providing Teradata driver and by providing the corresponding TeradataManager connection details
/opt/cloudera/parcels/CDH/bin/sqoop import \
--connect ... \
--username ... \
--password ... \
--table employees \
--warehouse-dir "..." \
--hive-import \
--delete-target-dir \
--hive-overwrite \
--as-orcfile \
--external-table-dir "..." \
--hs2-url "..." \
--connection-manager com.cloudera.connector.teradata.TeradataManager \
-m 1

Supported combinations of --driver and --connection-manager parameters

The compatibility matrix for driver and connection manager parameters remains unchanged and is the same as in the Sqoop-Teradata-Connector 1.8.3.2c7p1 release.

Other changes introduced through the TDCH upgrade to 1.8.5.1

The following features are added since the TDCH 1.8.3.2 version:

Features included in the 1.8.5.1 release
  • TDCH-2005: Update Teradata JDBC driver to 20.00.00.10
  • TDCH-2004: Certify TDCH 1.8.x on CDP 7.1.7 SP2
  • TDCH-1994: Add support for using both -targetpaths and -targettable for Hive import jobs
  • TDCH-2020: Discontinue using internal undocumented interfaces in TDJDBC
Features included in the 1.8.4.1 release
  • TDCH-1989: Certify TDCH on CDP 7.1.8
  • TDCH-1976: Fix Black Duck Security Issues
  • TDCH-1993: Add support for custom staging directory instead of the default /user/<username>/<temp_directory> location
  • TDCH-1962: Handling HASHAMP range of SQL query when AMP goes down in DBS
  • TDCH-1998: Update Teradata JDBC driver to 17.20.00.12
  • TDCH-1997: Include OSS License file (.pdf) in the rpm installation

CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8.3.2c7p1

The changes introduced in this release do not contain a new version of TDCH library, Teradata driver, or any additional required changes from Sqoop; therefore, the CDP compatibility is the same as in Cloudera Connector Powered by Teradata version 1.8.3.2.c7.

Extended Cloudera Data Platform (CDP) runtime compatibility also includes
  • CDP Public Cloud 7.2.9 and later
  • CDP Private Cloud Base 7.1.7 and later

Parquet support on CDP versions for HDFS or Hive imports

Some changes made earlier between TDCH 1.8.3.1 and 1.8.3.2 addressed an issue related to Hive API breakage, which made the latest Sqoop Teradata Connector 1.8.3.2c7 incompatible with TDCH 1.8.3.2 resulting in a bug in the Hive import process in Parquet file format.

This release focuses on resolving this incompatibility issue, so that the Teradata Manager can be used to run Hive imports in Parquet file format successfully as it was supposed to in Sqoop Teradata Connector 1.8.3.2c7 release.

Additional step required for importing Teradata to Hive in Parquet format

The --hs2-url argument must be provided explicitly as a Sqoop argument to support Hive JDBC connection with HiveServer2 (HS2) through TDCH.

Configuring user/password based authentication

From this release onwards, you can configure user/password based authentication (like LDAP) when connecting to Hive using Teradata Manager. For this, you must provide the required credentials either in the --hs2-url argument or explicitly using the --hs2-user and --hive-password Sqoop arguments.

Supported commands for Parquet imports

You can use the Parquet feature under the following conditions:

  • You can import from Teradata to Hive in Parquet format using one of the following commands:
    sqoop import --connect “jdbc:teradata://host/database” --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --hive-import --as-parquetfile
    sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…"
    sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…;user=foo;password=bar”
    sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…” --hs2-user foo --hive-password bar
  • You can import from Teradata to HDFS in Parquet format using only Generic JDBC connection manager and with the following options:
    sqoop import --connect “jdbc:teradata://host/database” --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --as-parquetfile
    Any version of the Sqoop Teradata connector supports this command.

Supported combinations of --driver and --connection-manager parameters

The driver and connection manager compatibility matrix has not changed since the previous release and is same as in the Sqoop Teradata Connector 1.8.3.2c7 version.

CDP compatible version of Cloudera Connector Powered by Teradata version 1.8.3.2c7 and the TDCH library to version 1.8.3.2

Cloudera Connector Powered by Teradata implements Parquet support in the Sqoop-Connector-Teradata component. In this release, you can use TeradataManager to import Parquet files.

  • CDP Private Cloud 7.1.8 and later compatibility

    If you install this connector version on CDP Private Cloud 7.1.8, you can import data from the Teradata server to Hive in Parquet using Teradata Manager.

  • CDP Public Cloud 7.2.13 and later compatibility

    If you install this connector version on CDP Public Cloud 7.2.13, you can import data from the Teradata server to Hive in Parquet format using Teradata Manager.

Extended Cloudera Data Platform (CDP) runtime compatibility also includes
  • CDP Public Cloud 7.2.9 and later
  • CDP Private Cloud Base 7.1.7 and later

Parquet support on CDP versions for HDFS or Hive imports

You can use the Parquet feature to import data from the Teradata server to HDFS or Hive in Parquet format using GenericJdbcManager with the Teradata JDBC driver under the following conditions:
  • Sqoop Teradata Connector 1.8.1c7 (older connector) or 1.8.3.2c7 (latest connector) is installed on one the following CDP versions:
    • CDP Public Cloud 7.2.9 - 7.2.12 (earlier CDP Public Cloud version)
    • CDP Private Cloud Base 7.1.7 (earlier CDP Private Cloud Base version)
  • Sqoop Teradata Connector 1.8.1c7 (earlier connector) is installed on one of the following CDP versions:
    • CDP Public Cloud 7.2.13 (latest Public Cloud version) and later compatibility
    • CDP Private Cloud 7.1.8 (latest Private Cloud Base version) and later compatibility

As shown above, the latest connector is backward compatible for use on earlier CDP versions and the earlier connector is forward-compatible for use on the later CDP versions.

You must use the supported combinations of --driver and --connection-manager parameters shown below in "Supported combinations of --driver and --connection-manager parameters".

Supported commands for Parquet imports

You can use the Parquet feature under the following conditions:
  • You can import from Teradata to Hive in Parquet format using one of the following commands:
    sqoop import --connect "jdbc:teradata://host/database" --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --hive-import --as-parquetfile
    sqoop import --connect "jdbc:teradata://host/database" --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile
  • You can import from Teradata to HDFS in Parquet format using only the following options:
    sqoop import --connect "jdbc:teradata://host/database" --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --as-parquetfile
    Any version of the Sqoop Teradata connector supports this command.

Supported combinations of --driver and --connection-manager parameters

The following table describes supported combinations of parameters when importing Parquet data from Teradata to HDFS:
--driver --connection-manager
- -
com.teradata.jdbc.TeraDriver -
com.teradata.jdbc.TeraDriver org.apache.sqoop.manager.GenericJdbcManager

The following table describes supported combinations when importing Parquet data from Teradata to Hive:

--driver --connection-manager
- -
com.teradata.jdbc.TeraDriver -
com.teradata.jdbc.TeraDriver org.apache.sqoop.manager.GenericJdbcManager
- com.cloudera.connector.teradata.TeradataManager

Other features

The following features have been added to the 1.8.3.2 version:

TDCH-1972
Certify TDCH on CDP 7.1.7 SP1 and add support for Hive JDBC with HiveServer2.

The following features are included in the 1.8.3.1 version:

TDCH-1919
TDCH support for Kerberos enabled Advanced SQL Engine (TDBMS)
TDCH-1921
Add more debug statements for "split.by.hash"
TDCH-1922
Add more debug statements for "split.by.value"
TDCH-1923
Add more debug statements for "split.by.partition"
TDCH-1924
Add more debug statements for "split.by.amp"
TDCH-1925
Add more debug statements for "batch.insert"
TDCH-1950
Certify TDCH with TDJDBC 17.10

The following features are included in the 1.8.2 release:

TDCH-1571
Add Timestamp Support for Parquet in TDCH
TDCH-1858
Certify TDCH with Advanced SQL Engine (TDBMS) 17.10
TDCH-1892
Adding more debug statements for fastload and fastexport methods for better debugging
TDCH-1897
Display the error at exact CLI option instead of generic message
The following changes in this version are related to the new Teradata JDBC connector incorporated into the Sqoop Teradata Connector 1.8.3.2c7:
  • Supported Teradata Database versions
    • Teradata Database 16.00Teradata Database 16.10
    • Teradata Database 16.20
    • Teradata Database 17.00
    • Teradata Database 17.05
    • Teradata Database 17.10
  • Supported Hadoop versions
    • Hadoop 3.1.1
  • Supported Hive versions
    • Hive 3.1.1
    • Hive 3.1.3
  • Certified Hadoop distributions
    • Cloudera Data Platform (CDP) Private Cloud Base (CDP Datacenter) 7.1.7
  • Supported Teradata Wallet versions
    • Teradata Wallet 16.20 - since TD Wallet supports multiple versions installed on the system, TD Wallet 16.20 must be installed to use TD Wallet functionality.

CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8.1c7 and the TDCH library to version 1.8.1

Cloudera Connector Powered by Teradata includes the following new features:
  • Extended Cloudera Data Platform (CDP) compatibility
    • CDP Public Cloud 7.2.9 and later
    • CDP Private Cloud Base 7.1.7 and later

CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8c7 and the TDCH library to version 1.8.0

Cloudera Connector Powered by Teradata includes the following new features:
  • Extended Cloudera Data Platform (CDP) compatibility
    • CDP Public Cloud 7.2.0 - 7.2.8
    • CDP Private Cloud Base 7.1.0 - 7.1.6
  • Support for sqoop import options --incremental lastmodified and --last-value

CDH 6 compatible version of Cloudera Connector Powered by Teradata 1.7.1c6 Available

Cloudera Connector Powered by Teradata 1.7.1c6 is compatible with CDH 6. It does not contain new features or changes.

CDH 6 compatible version of Cloudera Connector Powered by Teradata 1.7c6 Available

Cloudera Connector Powered by Teradata 1.7c6 is compatible with CDH 6. It does not contain new features or changes.

New Features in Cloudera Connector Powered by Teradata Version 1.7c5

Cloudera Connector Powered by Teradata now supports Teradata 16.x. This release upgrades the JDBC driver to version 16.10.00.05 and the TDCH library to version 1.5.4.

Cloudera Connector Powered by Teradata now supports importing tables without split-by column specified when the number of mappers is set to 1.

Cloudera Connector Powered by Teradata now supports the internal.fastexport input method. For table import, the following values for the --input-method option are valid:
  • split.by.partition
  • split.by.hash
  • split.by.value
  • split.by.amp
  • internal.fastexport

Note that the query import still only supports the split.by.partition input method.

The internal.fastexport method implements coordination between the mappers and a coordinator process (running on the edge node where the job was submitted). The host name and the port of this process are automatically resolved, but there are new options introduced for manual configuration:
  • --fastexport-socket-hostname: Configures the host of the coordinator process. It sets the tdch.input.teradata.fastexport.coordinator.socket.host Java property exposed by the underlying Teradata Connector for Hadoop (TDCH) library.
  • --fastexport-socket-port: Configures the port of the coordinator process. It sets the tdch.input.teradata.fastexport.coordinator.socket.port Java property exposed by the underlying Teradata Connector for Hadoop (TDCH) library.

For more information on these properties, see the Teradata Connector for Hadoop tutorial provided by Teradata.

New Features in Cloudera Connector Powered by Teradata Version 1.6.1c5

  • Adds support for SLES 12.

New Features in Cloudera Connector Powered by Teradata Version 1.6c5

  • Upgrades the JDBC driver to version 15.10.00.22 and the TDCH library to version 1.5.0. These libraries contain several bug fixes and improvements.
  • Adds the --schema argument, used to override the <td-instance> value in the connection string of the Sqoop command. For example, if the connection string in the Sqoop command is jdbc:teradata://<td-host>/DATABASE=database1, but you specify --schema database2, your data is imported from database2 and not database1. If the connection string does not contain the DATABASE parameter — for example jdbc:teradata://<td-host>/CHARSET=UTF8) — you can also use the --schema database argument to have Sqoop behave as if you specified the jdbc:teradata://<td-host>/DATABASE=databasename,CHARSET=UTF8 connection string.

New Features in Cloudera Connector Powered by Teradata Version 1.5c5

New features:
  • Fixed compatibility issue with CDH 5.5.0 and higher.

New Features in Cloudera Connector Powered by Teradata Versions 1.4c5

New features:
  • Added support for JDK 8.
  • Added --error-database option.
  • Added ability to specify format of date, time, and timestamp types when importing into CSV.
  • Import method split.by.amp now supports views.
  • Upgraded Teradata connector for Hadoop to version 1.3.4.

New Features and Changes in Cloudera Connector Powered by Teradata 1.3c5

New features:
  • Upgraded Teradata Connector for Hadoop to version 1.3.3.
  • Parcel distribution now contains Teradata JDBC driver; manual download no longer required.
  • Added support for query import into Avro file format.

Changes:

  • Export method multiple.fastload has been removed.

New Features in Cloudera Connector Powered by Teradata Versions 1.2c5

New features:
  • Upgraded Teradata Connector for Hadoop to version 1.2.1.
  • Added support for Avro.
  • Added support for Incremental import.
  • Added support for --where argument.
  • Added support for Hive import.
  • Added support for Importing all tables using import-all-tables.
  • Added support for Query Bands.
  • Added new import method split.by.amp (supported only on Teradata 14.10 and higher).

New Features in Cloudera Connector Powered by Teradata Version 1.0.0

This is the first release of this new connector. This connector features:
  • Support for secondary indexes.
  • Especially fast performance in most cases.

Limitations for Cloudera Connector Powered by Teradata

Limitations for Cloudera Connector Powered by Teradata has the following functional limitations.

  • Does not support HCatalog.
  • Does not support import into HBase.
  • Does not support upsert functionality (parameter --update-mode allowinsert).
  • Does not support the --boundary-query option.
  • Does not support Parquet file format.
  • Does not support export to Teradata VIEWs.
  • Does not support Kerberos authentication.
  • By default speculative execution is disabled for the Teradata Connector. This avoids placing redundant load on the Teradata database.

Known Issues and Workarounds

There are no known issues for customers using the following releases:
  • CDP Private Cloud Base 7.1.5 or later
  • CDP Public Cloud 7.2.6 or later

For customers using earlier CDP releases, Hive imports and exports using Sqoop will fail. To work around this issue, put the Hive common jar in the Sqoop library as follows:

Workaround

Copy hive-common-<version>.jar from /opt/cloudera/parcels/CDH/jars to /opt/cloudera/parcels/CDH/lib/sqoop/lib.

Getting Support

Support for the Cloudera Connector for Teradata is available through Cloudera Enterprise Support. Refer to Cloudera Support for more details.