<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Grouping Related Trends with Hadoop and Hive</title>
	<atom:link href="http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/</link>
	<description>Hadoop and Cloudera&#039;s Products and Services</description>
	<lastBuildDate>Fri, 10 Feb 2012 20:11:24 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Project ideas for Hadoop &#171; Ganbatte&#8230;!</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5861</link>
		<dc:creator>Project ideas for Hadoop &#171; Ganbatte&#8230;!</dc:creator>
		<pubDate>Thu, 15 Oct 2009 03:41:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5861</guid>
		<description>[...] http://www.cloudera.com/blog/2009/09/28/grouping-related-trends-with-had oop-and-hive/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://www.cloudera.com/blog/2009/09/28/grouping-related-trends-with-had" rel="nofollow">http://www.cloudera.com/blog/2009/09/28/grouping-related-trends-with-had</a> oop-and-hive/ [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andy.edmonds.be &#8250; links for 2009-09-29</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5484</link>
		<dc:creator>andy.edmonds.be &#8250; links for 2009-09-29</dc:creator>
		<pubDate>Wed, 30 Sep 2009 00:35:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5484</guid>
		<description>[...] Grouping Related Trends with Hadoop and Hive &#194;&#187; Cloudera Hadoop &amp; Big Data Blog (tags: hadoop hive datamining python trends graph)     This was written by andy. Posted on Wednesday, September 30, 2009, at 1:35 am. Filed under Delicious. Bookmark the permalink. Follow comments here with the RSS feed. Post a comment or leave a trackback. [...]</description>
		<content:encoded><![CDATA[<p>[...] Grouping Related Trends with Hadoop and Hive &#194;&#187; Cloudera Hadoop &amp; Big Data Blog (tags: hadoop hive datamining python trends graph)     This was written by andy. Posted on Wednesday, September 30, 2009, at 1:35 am. Filed under Delicious. Bookmark the permalink. Follow comments here with the RSS feed. Post a comment or leave a trackback. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pete Skomoroch</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5454</link>
		<dc:creator>Pete Skomoroch</dc:creator>
		<pubDate>Mon, 28 Sep 2009 21:58:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5454</guid>
		<description>Small correction found when looking at the query timings: the related links displayed are actually from another table that finds the &quot;mutual links&quot;, pages that are both outlinks and backlinks.  That requirement gives a more narrow set of backlink pages that are also pointed to by the target page: 

SELECT x,y,z 
FROM backlinks JOIN LINKS ON (links.page_id = backlinks.page_id)
WHERE links.pl_title = backlinks.bl_title;</description>
		<content:encoded><![CDATA[<p>Small correction found when looking at the query timings: the related links displayed are actually from another table that finds the &#8220;mutual links&#8221;, pages that are both outlinks and backlinks.  That requirement gives a more narrow set of backlink pages that are also pointed to by the target page: </p>
<p>SELECT x,y,z<br />
FROM backlinks JOIN LINKS ON (links.page_id = backlinks.page_id)<br />
WHERE links.pl_title = backlinks.bl_title;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pete Skomoroch</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5453</link>
		<dc:creator>Pete Skomoroch</dc:creator>
		<pubDate>Mon, 28 Sep 2009 21:49:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5453</guid>
		<description>Otis,

That final SELECT into backlinks_reduced took 330 seconds, which ran the streaming code on 2,445,704 rows. The example Hive selects from backlinks_reduced take around 160 seconds.  For use in a web application, I would export those final key-value pairs to MySQL and index by page_id or load them into a non-RDBMS datastore.  Hive is best for offline batch operations where you need to run against all the data or a certain partition.

-Pete</description>
		<content:encoded><![CDATA[<p>Otis,</p>
<p>That final SELECT into backlinks_reduced took 330 seconds, which ran the streaming code on 2,445,704 rows. The example Hive selects from backlinks_reduced take around 160 seconds.  For use in a web application, I would export those final key-value pairs to MySQL and index by page_id or load them into a non-RDBMS datastore.  Hive is best for offline batch operations where you need to run against all the data or a certain partition.</p>
<p>-Pete</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Otis Gospodnetic</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5452</link>
		<dc:creator>Otis Gospodnetic</dc:creator>
		<pubDate>Mon, 28 Sep 2009 21:24:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5452</guid>
		<description>Pete, can you share how fast those backlinks SELECTs on a dataset this large are?</description>
		<content:encoded><![CDATA[<p>Pete, can you share how fast those backlinks SELECTs on a dataset this large are?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lalit Kapoor</title>
		<link>http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop-and-hive/comment-page-1/#comment-5446</link>
		<dc:creator>Lalit Kapoor</dc:creator>
		<pubDate>Mon, 28 Sep 2009 17:12:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1350#comment-5446</guid>
		<description>Pete, this is a great post. I am glad that you organized it so well. Thanks for sharing your work. This is a really cool project to replicate if you want to learn a bit and get your hands dirty.</description>
		<content:encoded><![CDATA[<p>Pete, this is a great post. I am glad that you organized it so well. Thanks for sharing your work. This is a really cool project to replicate if you want to learn a bit and get your hands dirty.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

