Developer Center

Hadoop World: Sqoop – Database Import for Hadoop

At Cloudera, we’re always working to make it easier for you to work with Hadoop and integrate Hadoop-based systems in with your existing data sources. One example of how we accomplish this is Sqoop, a database import tool developed at Cloudera that allows you to easily copy data between databases and HDFS. We originally announced this tool in June, but we’ve been steadily improving it since then. It can now talk with several more databases than before, and performance has been improved considerably. Sqoop has demonstrated its usefulness pretty quickly; several open source projects and many of our clients use Sqoop as part of their data pipeline. Last summer our friend Pete Skomoroch demonstrated how to integrate it into his Wikipedia Trending Topics project (blog tutorial ).

This talk at Hadoop World NYC by Cloudera engineer Aaron Kimball introduces Sqoop, describes its use cases, and gives some technical details of how it works.

Filed under

7 Responses leave a comment...
  • Jing / December 10, 2009 / 12:39 PM

    Does Sqoop work with Teradata?

  • aaron / December 10, 2009 / 12:53 PM

    I’m not certain. Sqoop uses JDBC and generates standard SQL, so it can connect to many database vendors out-of-the-box. Some vendors require nonstandard SQL, or implement JDBC drivers slightly different than the norm, for which we have to write special-case code.

    We haven’t explicitly tested it with Teradata. I’d love to hear feedback as to how that experiment works out.

  • Chetan Conikee / December 10, 2009 / 5:08 PM

    Typically Netezza and TeraData are bundled with their own proprietary OBDC bindings [and] ship with a JAR file which essentially is a ODBC-JDBC bridge..
    So if you have the specified bridge jar you should be able to Sqoop from/to Netezza or TeraData

  • Oded / December 14, 2009 / 3:17 AM

    any updates re unsqoop?

  • Arushi / August 11, 2010 / 2:38 AM

    Any updates on unsqoop?

    I wanted to get data using hadoop, modify it using mapreduce and put it back into the mysql.

    Is it possible with latest sqoop or shall we wait for unsqoop to send data back to MySql?

Leave a comment