<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cloudera &#187; dbinputformat</title>
	<atom:link href="http://www.cloudera.com/blog/tag/dbinputformat/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cloudera.com</link>
	<description>Hadoop and Cloudera&#039;s Products and Services</description>
	<lastBuildDate>Thu, 24 May 2012 17:53:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Introducing Sqoop</title>
		<link>http://www.cloudera.com/blog/2009/06/introducing-sqoop/</link>
		<comments>http://www.cloudera.com/blog/2009/06/introducing-sqoop/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 17:00:37 +0000</pubDate>
		<dc:creator>Aaron Kimball</dc:creator>
				<category><![CDATA[data collection]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[dbinputformat]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[sqoop]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=759</guid>
		<description><![CDATA[In addition to providing you with a dependable release of Hadoop that is easy to configure, at Cloudera we also focus on developing tools to extend Hadoop&#8217;s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we&#8217;re proud to announce the availability of Sqoop, a tool designed to easily [...]]]></description>
			<content:encoded><![CDATA[<p>In addition to providing you with a <a href="http://www.cloudera.com/hadoop">dependable release of Hadoop</a> that is <a href="http://my.cloudera.com">easy to configure</a>, at Cloudera we also focus on developing tools to extend Hadoop&#8217;s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we&#8217;re proud to announce the availability of Sqoop, a tool designed to easily import information from SQL databases into your Hadoop cluster.</p>
<p>Sqoop (&#8220;SQL-to-Hadoop&#8221;) is a straightforward command-line tool with the following capabilities:</p>
<ul>
<li>Imports individual tables or entire databases to files in HDFS</li>
<li>Generates Java classes to allow you to interact with your imported data</li>
<li>Provides the ability to import from SQL databases straight into your <a href="http://hadoop.apache.org/hive">Hive</a> data warehouse</li>
</ul>
<p>After setting up an import job in Sqoop, you can get started working with SQL database-backed data from your Hadoop MapReduce cluster in minutes.</p>
<h2>Motivation</h2>
<p>Hadoop MapReduce is a powerful tool; its flexibility in parsing unstructured or semi-structured data means that there is a lot of potential for creative applications. But your analyses are only as useful as the data which they process. In many organizations, large volumes of useful information are locked away in disparate databases across the enterprise. HDFS, Hadoop&#8217;s distributed file system represents a great place to bring this data together, but actually doing so is a cumbersome task.</p>
<p>Consider the task of processing access logs and analysing user behavior on your web site. Users may present your site with a cookie that identifies who they are. You can log the cookies in conjunction with the pages they visit. This lets you coordinate users with their actions. But actually matching their behavior against their profiles or their previously recorded history requires that you look up information in a database. If several MapReduce programs needed to do similar joins, the database server would experience very high load, in addition to a large number of concurrent connections, while MapReduce programs were running, possibly causing performance of your interactive web site to suffer.</p>
<p>The solution: periodically dump the contents of the users database and the action history database to HDFS, and let your MapReduce programs join against the data stored there. Going one step further, you could take the in-HDFS copy of the users database and import it into <a href="http://hadoop.apache.org/hive/">Hive</a>, allowing you to perform ad-hoc SQL queries against the entire database without working on the production database.</p>
<p>Sqoop makes all of the above possible with a single command-line.</p>
<h2>Example Usage</h2>
<p>Continuing the example above, let&#8217;s say that our front end servers connected to a MySQL database named <tt>website</tt>, stored on <tt>db.example.com</tt>. The <tt>website</tt> database has several tables, but the one we are most interested in is one named <tt>USERS</tt>.</p>
<p>This table has several columns; it might have been created from a SQL statement like:</p>
<pre><tt>CREATE TABLE USERS (
  user_id INTEGER NOT NULL PRIMARY KEY,
  first_name VARCHAR(32) NOT NULL,
  last_name VARCHAR(32) NOT NULL,
  join_date DATE NOT NULL,
  zip INTEGER,
  state CHAR(2),
  email VARCHAR(128),
  password_hash CHAR(64));
</tt></pre>
<p>Importing this table into HDFS could be done with the command:</p>
<pre><tt>you@db$ sqoop --connect jdbc:mysql://db.example.com/website --table USERS \
    --local --hive-import
</tt></pre>
<p>This would connect to the MySQL database on this server and import the <tt>USERS</tt> table into HDFS. The <tt>&ndash;-local</tt> option instructs Sqoop to take advantage of a local MySQL connection which performs very well. The <tt>&ndash;-hive-import</tt> option means that after reading the data into HDFS, Sqoop will connect to the Hive metastore, create a table named <tt>USERS</tt> with the same columns and types (translated into their closest analogues in Hive), and load the data into the Hive warehouse directory on HDFS (instead of a subdirectory of your HDFS home directory).</p>
<p>Suppose you wanted to work with this data in MapReduce and weren&#8217;t concerned with Hive. When storing this table in HDFS, you might want to take advantage of compression, so you&#8217;d like to be able to store the data in <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html">SequenceFiles</a>. In this case, you might want to import the data with the command:</p>
<pre><tt>you@db@ sqoop --connect jdbc:mysql://db.example.com/website --table USERS \
    --as-sequencefile</tt></pre>
<p>Sqoop will also emit a Java class named <tt>USERS</tt> with getter methods for each of the columns of the table.</p>
<p>They support the majority of SQL&#8217;s types including optionally-null values. The data will be loaded into HDFS as a set of SequenceFiles; you can use the <tt>USERS.java</tt> class to work with the data in your MapReduce analyses.</p>
<p>Sqoop can also connect to other databases besides MySQL; anything with a JDBC driver should work. If you are running locally on a MySQL server the import will be especially high-performance, but a MapReduce-based import mechanism allows remote database connections as well. Authenticated connections with usernames and passwords are also supported. Several other options allow you to control which columns of a table are imported, and other aspects of the import process. The full reference manual is available at <a href="http://www.cloudera.com/hadoop-sqoop">www.cloudera.com/hadoop-sqoop</a>.</p>
<h2>A Closer Look</h2>
<p>In this section I&#8217;ll briefly outline how Sqoop works under the hood.</p>
<p>In an <a href="http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/">earlier blog post</a>, I described the <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html">DBInputFormat</a>, a connector that allows Hadoop MapReduce programs to read rows from SQL databases. DBInputFormat allows Hadoop to read input from <a href="http://en.wikipedia.org/wiki/Jdbc">JDBC</a>: a Java interface to databases that most popular database vendors (Oracle, MySQL, Postgresql, etc.) implement.</p>
<p>In order to use DBInputFormat you need to write a class that deserializes the columns from the database record into individual data fields to work with. This is pretty tedious&mdash;and entirely algorithmic. Sqoop auto-generates class definitions to deserialze the data from the database. These classes can also be used to store the results in Hadoop&#8217;s SequenceFile format, which allows you to take advantage of built-in compression within HDFS too. The classes are written out as <tt>.java</tt> files that you can incorporate in your own data processing pipeline later. The class definition is created by taking advantage of JDBC&#8217;s ability to read metadata about databases and tables.</p>
<p>When Sqoop is invoked, it retrieves the table&#8217;s metadata, writes out the class definition for the columns you want to import, and launches a MapReduce job to import the table body proper.</p>
<p>Hadoop users know that moving large volumes of data can be a time-intensive operation. While it provides a reliable implementation-independent mechanism to read database tables, using a MapReduce JDBC job to import data from a remote database is often inefficient. Database vendors usually provide an export tool that exports data in a more high-performance manner. Sqoop is capable of  using alternate import strategies as well. By examining the <em>connect string URL</em> that tells Sqoop which database to connect to, Sqoop will choose alternate import strategies as appropriate to the database. We&#8217;ve already implemented the ability to take advantage of MySQL&#8217;s export tool called <tt>mysqldump</tt>. We&#8217;ll add support for other systems as soon as we can.</p>
<h2>Getting Sqoop</h2>
<p>The first beta release of Sqoop is available today as part of <a href="http://www.cloudera.com/hadoop">Cloudera&#8217;s Distribution for Hadoop</a>. It installs as part of the same RPM (or Debian package) that contains Hadoop itself.</p>
<p>Hadoop users who aren&#8217;t using our distribution can apply the patch that is contributed to Apache Hadoop as issue <a href="http://issues.apache.org/jira/browse/HADOOP-5815">HADOOP-5815</a>, and compile it themselves, but Sqoop won&#8217;t be part of the standard Hadoop release for some time (at least until version 0.21.0). <tt>mysqldump</tt> support is added in <a href="http://issues.apache.org/jira/browse/HADOOP-5844">HADOOP-5844</a>, and Hive integration is provided in <a href="http://issues.apache.org/jira/browse/HADOOP-5887">HADOOP-5887</a>.</p>
<p>You can read the documentation for Sqoop at <a href="http://www.cloudera.com/hadoop-sqoop">http://www.cloudera.com/hadoop-sqoop</a>. You can also get some basic usage information from Sqoop itself by running <tt>sqoop &ndash;-help</tt> after it&#8217;s installed.</p>
<p>We also did a preview of this tool at the May Bay Area Hadoop User Group meet-up; you can catch the presentation here:</p>
<iframe src='http://player.vimeo.com/video/4778230?title=1&amp;byline=1&amp;portrait=1' width='400' height='225' frameborder='0'></iframe>
<p>We hope you find this tool useful&mdash;please check it out! Then let us know your feedback on <a href="http://getsatisfaction.com/cloudera/products/cloudera_sqoop">GetSatisfaction</a>. Bug reports and feature requests especially welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2009/06/introducing-sqoop/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
		<item>
		<title>Database Access with Hadoop</title>
		<link>http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/</link>
		<comments>http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/#comments</comments>
		<pubDate>Fri, 06 Mar 2009 23:04:44 +0000</pubDate>
		<dc:creator>Aaron Kimball</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[dbinputformat]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=291</guid>
		<description><![CDATA[Hadoop&#8217;s strength is that it enables ad-hoc analysis of unstructured or semi-structured data. Relational databases, by contrast, allow for fast queries of very structured data sources. A point of frustration has been the inability to easily query both of these sources at the same time. The DBInputFormat component provided in Hadoop 0.19 finally allows easy [...]]]></description>
			<content:encoded><![CDATA[<p>Hadoop&#8217;s strength is that it enables ad-hoc analysis of unstructured or semi-structured data. Relational databases, by contrast, allow for fast queries of very structured data sources. A point of frustration has been the inability to easily query both of these sources at the same time. The <strong><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html">DBInputFormat</a></strong> component provided in Hadoop 0.19 finally allows easy import and export of data between Hadoop and many relational databases, allowing relational data to be more easily incorporated into your data processing pipeline.</p>
<p>This blog post explains how the DBInputFormat works and provides an example of using DBInputFormat to import data into HDFS.</p>
<h3>DBInputFormat and JDBC</h3>
<p>First we&#8217;ll cover how DBInputFormat interacts with databases. DBInputFormat uses <a href="http://en.wikipedia.org/wiki/Jdbc">JDBC</a> to connect to data sources. Because JDBC is widely implemented, DBInputFormat can work with MySQL, PostgreSQL, and several other database systems. Individual database vendors provide JDBC drivers to allow third-party applications (like Hadoop) to connect to their databases. Links to popular drivers are listed in the resources section at the end of this post.</p>
<p>To start using DBInputFormat to connect to your database, you&#8217;ll need to download the appropriate database driver from the list in the resources section (see the end of this post), and drop it into the <tt>$HADOOP_HOME/lib/</tt> directory on your Hadoop TaskTracker machines, and on the machine where you launch your jobs from.</p>
<h3>Reading Tables with DBInputFormat</h3>
<p>The DBInputFormat is an InputFormat class that allows you to read data from a database. An InputFormat is Hadoop&#8217;s formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL queries performed against the database. Most queries are supported, subject to a few limitations discussed at the end of this article.</p>
<h4>Configuring the job</h4>
<p>To use the DBInputFormat, you&#8217;ll need to configure your job. The following example shows how to connect to a MySQL database and load from a table:</p>
<div><tt> </tt></p>
<pre><tt>CREATE TABLE employees (
  employee_id INTEGER NOT NULL PRIMARY KEY,
  name VARCHAR(32) NOT NULL);
</tt></pre>
<p><tt></tt></p>
<p><em>Listing 1: Example table schema</em></div>
<div><tt> </tt></p>
<pre><tt>JobConf conf = new JobConf(getConf(), MyDriver.class);
conf.setInputFormat(DBInputFormat.class);
DBConfiguration.configureDB(conf,
    &#8220;com.mysql.jdbc.Driver&#8221;,
    &#8220;jdbc:mysql://localhost/mydatabase&#8221;);
String [] fields = { &#8220;employee_id&#8221;, "name" };
DBInputFormat.setInput(conf, MyRecord.class, &#8220;employees&#8221;,
    null /* conditions */,  &#8220;employee_id&#8221;, fields);
// set Mapper, etc., and call JobClient.runJob(conf);
</tt></pre>
<p><tt></tt></p>
<p><em>Listing 2: Java code to set up a MapReduce job with DBInputFormat</em></div>
<p>This example code will connect to <tt>mydatabase</tt> on localhost and read the two fields from the <tt>employees</tt> table.</p>
<p>The <tt>configureDB()</tt> and <tt>setInput()</tt> calls configure the DBInputFormat. The first call specifies the JDBC driver implementation to use and what database to connect to. The second call specifies what data to load from the database. The <tt>MyRecord</tt> class is the class where data will be read into in Java, and <tt>"employees"</tt> is the name of the table to read. The <tt>"employee_id"</tt> parameter specifies the table&#8217;s primary key, used for ordering results. The section &#8220;Limitations of the InputFormat&#8221; below explains why this is necessary. Finally, the <tt>fields</tt> array lists what columns of the table to read. An overloaded definition of <tt>setInput()</tt> allows you to specify an arbitrary SQL query to read from, instead.</p>
<p>After calling <tt>configureDB()</tt> and <tt>setInput()</tt>, you should configure the rest of your job as usual, setting the Mapper and Reducer classes, specifying any other data sources to read from (e.g., datasets in HDFS) and other job-specific parameters.</p>
<h4>Retrieving the data</h4>
<p>The DBInputFormat will read from the database, but how does data get to your mapper? The <tt>setInput()</tt> method used in the example above took, as a parameter, the name of a class which will hold the contents of one row. You&#8217;ll need to write an implementation of the <tt>DBWritable</tt> interface to allow DBInputFormat to populate your class with fields from the table.  <tt>DBWritable</tt> is an adaptor interface that allows data to be read and written using both Hadoop&#8217;s internal serialization mechanism, and using JDBC calls. Once the data is read into your custom class, you can then read the class&#8217; fields in the mapper.</p>
<p>The following example provides a <tt>DBWritable</tt> implementation that holds one record from the <tt>employees</tt> table, as described above:</p>
<div><tt> </tt></p>
<pre><tt>class MyRecord implements Writable, DBWritable {
  long id;
  String name;

  public void readFields(DataInput in) throws IOException {
    this.id = in.readLong();
    this.name = Text.readString(in);
  }

  public void readFields(ResultSet resultSet)
      throws SQLException {
    this.id = resultSet.getLong(1);
    this.name = resultSet.getString(2);
  }

  public void write(DataOutput out) throws IOException {
    out.writeLong(this.id);
    Text.writeString(out, this.name);
  }

  public void write(PreparedStatement stmt) throws SQLException {
    stmt.setLong(1, this.id);
    stmt.setString(2, this.name);
  }
}
</tt></pre>
<p><em>Listing 3: <tt>DBWritable</tt> implementation for records from the <tt>employees</tt> table</em></div>
<p>A <tt>java.sql.ResultSet</tt> object represents the data returned from a SQL statement. It contains a cursor representing a single row of the results. This row will contain the fields specified in the <tt>setInput()</tt> call. In the <tt>readFields()</tt> method of <tt>MyRecord</tt>, we read the two fields from the <tt>ResultSet</tt>. The <tt>readFields()</tt> and <tt>write()</tt> methods that operate on <tt>java.io.DataInput</tt> and <tt>DataOutput</tt> objects are part of the <tt>Writable</tt> interface used by Hadoop to marshal data between mappers and reducers, or pack results into SequenceFiles.</p>
<h4>Using the data in a mapper</h4>
<p>The mapper then receives an instance of your <tt>DBWritable</tt> implementation as its input value. The input key is a row id provided by the database; you&#8217;ll most likely discard this value.</p>
<div><tt> </tt></p>
<pre><tt>public class MyMapper extends MapReduceBase
    implements Mapper&lt;LongWritable, MyRecord, LongWritable, Text&gt; {
  public void map(LongWritable key, MyRecord val,
      OutputCollector&lt;LongWritable, Text&gt; output, Reporter reporter) throws IOException {
    // Use val.id, val.name here
    output.collect(new LongWritable(val.id), new Text(val.name));
  }
}
</tt></pre>
<p><tt></tt></p>
<p><em>Listing 4: Example mapper using a custom <tt>DBWritable</tt></em></div>
<h4>Writing results back to the database</h4>
<p>A companion class, DBOutputFormat, will allow you to write results back to a database. When setting up the job, call <tt>conf.setOutputFormat(DBOutputFormat.class);</tt> and then call <tt>DBConfiguration.configureDB()</tt> as before.</p>
<p>The <tt>DBOutputFormat.setOutput()</tt> method then defines how the results will be written back to the database. Its three arguments are the <tt>JobConf</tt> object for the job, a string defining the name of the table to write to, and an array of strings defining the fields of the table to populate. e.g., <tt>DBOutputFormat.setOutput(job, "employees", "employee_id", "name");</tt>.</p>
<p>The same <tt>DBWritable</tt> implementation that you created earlier will suffice to inject records back into the database. The <tt>write(PreparedStatement stmt)</tt> method will be invoked on each instance of the <tt>DBWritable</tt> that you pass to the OutputCollector from the reducer. At the end of reducing, those PreparedStatement objects will be turned into <tt>INSERT</tt> statements to run against the SQL database.</p>
<h3>Limitations of the InputFormat</h3>
<p>JDBC allows applications to generate SQL queries which are executed against the database; the results are then returned to the calling application. Keep in mind that you will be interacting with your database via repeated SQL queries.  Therefore:</p>
<ul>
<li>Hadoop may need to execute the same query multiple times. It will need to return the same results each time. So any concurrent updates to your database, etc, should not affect the query being run by your MapReduce job. This can be accomplished by disallowing writes to the table while your MapReduce job runs, restricting your MapReduce&#8217;s query via a clause such as &#8220;<tt>insert_date &lt; <i>yesterday</i></tt>,&#8221; or dumping the data to a temporary table in the database before launching your MapReduce process.</li>
<li>In order to parallelize the processing of records from the database, Hadoop will execute SQL queries that use <tt>ORDER BY</tt>, <tt>LIMIT</tt>, and <tt>OFFSET</tt> clauses to select ranges out of tables. Your results, therefore, need to be orderable by one or more keys (either PRIMARY, like the one in the example, or UNIQUE).</li>
<li>In order to set the number of map tasks, the DBInputFormat needs to know how many records it will read. So if you&#8217;re writing an arbitrary SQL query against the database, you will need to provide a second query that returns the number of rows that the first query will return (e.g., by using <tt>COUNT</tt> and <tt>GROUP BY</tt>).</li>
</ul>
<p>With these restrictions in mind, there&#8217;s still a great deal of flexibility available to you. You can bulk load entire tables into HDFS, or select large ranges of data. For example, if you want to read records from a table that is also being populated by another source concurrently, you might set up that table to attach a timestamp field to each record. Before doing the bulk read, pick the current timestamp, then select all records with timestamps earlier than that one. New records being fed in by the other writer will have later timestamps and will not affect the MapReduce job.</p>
<p>Finally, be careful to understand the bottlenecks in your data processing pipeline. Launching a MapReduce job with 100 mappers performing queries against a database server may overload the server or its network connection. In this case, you&#8217;ll achieve less parallelism than theoretically possible, due to starvation, disk seeks, and other performance penalties.</p>
<h3>Limitations of the OutputFormat</h3>
<p>The DBOutputFormat writes to the database by generating a set of <tt>INSERT</tt> statements in each reducer. The reducer&#8217;s <tt>close()</tt> method then executes them in a bulk transaction. Performing a large number of these from several reduce tasks concurrently can swamp a database. If you want to export a very large volume of data, you may be better off generating the <tt>INSERT</tt> statements into a text file, and then using a bulk data import tool provided by your database to do the database import.</p>
<h3>Conclusions</h3>
<p>DBInputFormat provides a straightforward interface to read data from a database into your MapReduce applications. You can read database tables into HDFS, import them into Hive, or use them to perform joins in MapReduce jobs. By supporting JDBC, it provides a common interface to a variety of different database sources.</p>
<p>This is probably best not used as a primary data access mechanism; queries against database-driven data are most efficiently executed within the database itself, and large-scale data migration is better done using the bulk data export/import tools associated with your database. But when analysis of ad hoc data in HDFS can be improved by the addition of some additional relational data, DBInputFormat allows you to quickly perform the join without a large amount of setup overhead. DBOutputFormat then allows you to export results back to the same database for combining with other database-driven tables.</p>
<p>DBInputFormat is available in Hadoop 0.19 and is provided by <a href="http://issues.apache.org/jira/browse/HADOOP-2536">HADOOP-2536</a>, a patch started by Fredrik Hedberg and further developed by Enis Soztutar. A backport of this patch that can be applied to Hadoop 0.18.3 is available at the above link.</p>
<p>This article is based on a talk I gave at the SF Bay Hadoop User Group meetup on Feburary 18th; the slides from that talk are <a href="http://www.cloudera.com/blog/wp-content/uploads/DBInputFormat.pdf">available as a PDF</a>.</p>
<h3>Resources</h3>
<ul>
<li><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html">DBInputFormat documentation</a></li>
<li><a href="http://en.wikipedia.org/wiki/Jdbc">Wikipedia on JDBC</a></li>
<li>Popular JDBC drivers:
<ul>
<li><a href="http://dev.mysql.com/downloads/connector/j/3.1.html">MySQL: Connector/J</a></li>
<li><a href="http://jdbc.postgresql.org/">PostgreSQL JDBC</a></li>
<li><a href="http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/index.html">Oracle JDBC</a> (Note: DBInputFormat currently does not work with Oracle, but this should change soon.)</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

