<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cloudera &#187; testing</title>
	<atom:link href="http://www.cloudera.com/blog/category/testing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cloudera.com</link>
	<description>Hadoop and Cloudera&#039;s Products and Services</description>
	<lastBuildDate>Thu, 24 May 2012 17:53:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Apache MRUnit 0.9.0-incubating has been released!</title>
		<link>http://www.cloudera.com/blog/2012/05/apache-mrunit-0-9-0-incubating-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2012/05/apache-mrunit-0-9-0-incubating-has-been-released/#comments</comments>
		<pubDate>Wed, 02 May 2012 04:38:51 +0000</pubDate>
		<dc:creator>Brock Noland</dc:creator>
				<category><![CDATA[community]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[Hadoop Testing]]></category>
		<category><![CDATA[Map Reduce Testing]]></category>
		<category><![CDATA[MapReduce Testing]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=14603</guid>
		<description><![CDATA[This post was originally posted on the Apache Software Foundation&#8217;s blog. We (the Apache MRUnit team) have just released Apache MRUnit 0.9.0-incubating (tarball, nexus, javadoc). Apache MRUnit is an Apache Incubator project that is a Java library which helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post was originally posted on the <a href="https://blogs.apache.org/mrunit/entry/apache_mrunit_0_9_0" target="_blank">Apache Software Foundation&#8217;s blog</a>.</em></p>
<p>We (the Apache <abbr title="MapReduce Unit">MRUnit</abbr> team) have just released Apache MRUnit 0.9.0-incubating (<a href="http://www.apache.org/dyn/closer.cgi/incubator/mrunit/" target="_blank">tarball</a>, <a href="https://repository.apache.org/index.html#nexus-search;gav~org.apache.mrunit~~~~" target="_blank">nexus</a>, <a href="http://incubator.apache.org/mrunit/documentation/javadocs/0.9.0-incubating/index.html" target="_blank">javadoc</a>). Apache MRUnit is an Apache Incubator project that is a Java library which helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they&#8217;re deployed to a production system.</p>
<p>The MRUnit project is quite active, 0.9.0 is our fourth release since entering the incubator and we have added 4 new committers beyond the projects initial charter! We are very interested in having new contributors and committers join the project! Please join our <a href="http://incubator.apache.org/mrunit/community/mailing_lists.html" target="_blank">mailing list</a> to find out how you can help!</p>
<p>The MRUnit build process has changed to produce mrunit-0.9.0-hadoop1.jar and mrunit-0.9.0-hadoop2.jar instead of mrunit-0.9.0-hadoop020.jar, mrunit-0.9.0-hadoop100.jar and mrunit-0.9.0-hadoop023.jar. The hadoop1 classifier is for all Apache Hadoop versions based off the 0.20.X line including 1.0.X. The hadoop2 classifier is for all Apache Hadoop versions based off the 0.23.X line including the unreleased 2.0.X.</p>
<p>This <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311292&#038;version=12316360" target="_blank">release</a> contains 2 new features, 15 improvements and 6 bug fixes. I will highlight a few below:</p>
<ul>
<li>Support custom counter checking in <a href="https://issues.apache.org/jira/browse/MRUNIT-68" target="_blank">MRUNIT-68</a></li>
<li>runTest() should optionally ignore output order in <a href="https://issues.apache.org/jira/browse/MRUNIT-91" target="_blank">MRUNIT-91</a></li>
<li>Driver.runTest throws RuntimeException should it throw AssertionError in <a href="https://issues.apache.org/jira/browse/MRUNIT-54" target="_blank">MRUNIT-54</a></li>
<li>o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner in <a href="https://issues.apache.org/jira/browse/MRUNIT-67" target="_blank">MRUNIT-67</a></li>
<li>Better support for other serializations besides Writable:  <a href="https://issues.apache.org/jira/browse/MRUNIT-70" target="_blank">MRUNIT-70</a>,  <a href="https://issues.apache.org/jira/browse/MRUNIT-86">MRUNIT-86</a>,  <a href="https://issues.apache.org/jira/browse/MRUNIT-99" target="_blank">MRUNIT-99</a>,  <a href="https://issues.apache.org/jira/browse/MRUNIT-77" target="_blank">MRUNIT-77</a></li>
<li>Better error messages from validate, null checking and forgetting to set mappers and reducers: <a href="https://issues.apache.org/jira/browse/MRUNIT-74" target="_blank">MRUNIT-74</a>, <a href="https://issues.apache.org/jira/browse/MRUNIT-66" target="_blank">MRUNIT-66</a>, <a href="https://issues.apache.org/jira/browse/MRUNIT-65" target="_blank">MRUNIT-65</a></li>
<li>add static convenience methods to PipelineMapReduceDriver class in <a href="https://issues.apache.org/jira/browse/MRUNIT-89" target="_blank">MRUNIT-89</a></li>
<li>Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods in <a href="https://issues.apache.org/jira/browse/MRUNIT-48" target="_blank">MRUNIT-48</a></li>
</ul>
<h2 style="font-size:14pt;color:#243543;">Support custom counter checking</h2>
<p>It has always been possible to check the counter values like so:</p>
<pre class="code">assertEquals(2, mapDriver.getCounters().findCounter(CustomMapper.CustomCounter.NAME).getValue());
</pre>
<p>but this is quite tedious. As such Jarek Jarcec Cecho (our second newest committer) added this feature directly to the drivers:</p>
<pre class="code">.withCounter(CustomMapper.CustomCounter.Name, 2);
</pre>
<h2 style="font-size:14pt;padding-top:16px;color:#243543;">runTest() should optionally ignore output order</h2>
<p>Previous to this change MRUnit required Mapper/Reducer classes to output key value pairs in the order specified on the test. Well defined output order is common, but strictly not universal. Dave Beech (our newest committer) contributed a patch so you optionally turn this ordered requirement off by using:</p>
<pre class="code">.runTest(false)
</pre>
<p style="padding-top:12px">instead of</p>
<pre class="code">.runTest()
</pre>
<h2 style="font-size:14pt;line-height:1.3em;padding-top:16px;color:#243543;">Driver.runTest throws RuntimeException should it throw AssertionError</h2>
<p>Previous versions of MRUnit threw a RuntimeException when a test failed. This worked well, but it meant that testing frameworks saw the the test as having erred, not failed. We have changed this to AssertionError so that testing frameworks see the tests as failed. The distinction is small but important.</p>
<h2 style="font-size:14pt;color:#243543;">o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner</h2>
<p>Previously the MRUnit only supported a combiner in the mapred MapReduceDriver class but now the mapreduce MapReduceDriver also supports a combiner by:</p>
<pre class="code">MapReduceDriver.newMapReduceDriver(mapper, reducer, combiner)</pre>
<p style="padding-top:12px">or</p>
<pre class="code">.withCombiner(combiner) or .setCombiner(combiner)</pre>
<h2 style="font-size:14pt;padding-top:16px;color:#243543;">Better support for other serializations besides Writable</h2>
<p>Previous versions of MRUnit did not support JavaSerialization, Avro or other Serialization frameworks well. We improved alternative serialization support by not forcing K2 in MapReduceDriver to be Comparable and supporting serializations that cannot clone into a object or that do not have default constructors.</p>
<h2 style="font-size:14pt;line-height:1.3em;color:#243543;">Better error messages from validate, null checking and forgetting to set mappers and reducers</h2>
<p>We have improved checking of parameters passed to MRUnit and the error messages when the parameters are invalid including throwing NullPointerException immediately when receiving a null value and throwing a IllegalStateExcpetion when no mapper or reducer class is provided instead of a NullPointerException.</p>
<h2 style="font-size:14pt;color:#243543;">Add static convenience methods to PipelineMapReduceDriver class</h2>
<p>add static convenience constructors similar to those in the other driver classes:</p>
<pre class="code">PipelineMapReduceDriver.newPipelineMapReduceDriver()</pre>
<p style="padding-top:12px">or</p>
<pre class="code">PipelineMapReduceDriver.newPipelineMapReduceDriver(list of Pair<Mapper, Reducer>)</pre>
<h2 style="font-size:14pt;line-height:1.3em;padding-top:16px;color:#243543;">Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods</h2>
<p>The OutputFromString and InputFromString methods are now deprecated because they required Text inputs or outputs with no way to enforce that the inputs or outputs from a mapper or reducer were actually Text. These methods also provided little convenience as a user can just pass the string they intended to new Text(string)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2012/05/apache-mrunit-0-9-0-incubating-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>2010 Cloudera Apache Hadoop Webinars</title>
		<link>http://www.cloudera.com/blog/2011/01/2010-cloudera-apache-hadoop-webinars/</link>
		<comments>http://www.cloudera.com/blog/2011/01/2010-cloudera-apache-hadoop-webinars/#comments</comments>
		<pubDate>Thu, 06 Jan 2011 14:00:53 +0000</pubDate>
		<dc:creator>Jon Zuanich</dc:creator>
				<category><![CDATA[community]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[#cdh3]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[apache hadoop]]></category>
		<category><![CDATA[CDH]]></category>
		<category><![CDATA[cdh3b2]]></category>
		<category><![CDATA[cloudera]]></category>
		<category><![CDATA[cloudera's distribution for hadoop]]></category>
		<category><![CDATA[integration]]></category>
		<category><![CDATA[lessons learned]]></category>
		<category><![CDATA[productiion]]></category>
		<category><![CDATA[run]]></category>
		<category><![CDATA[webinar]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=5755</guid>
		<description><![CDATA[Cloudera produced several webinars in 2010 providing attendees with insights into a range of topics from technical best practices to common business applications of Hadoop. These webinars proved to be very popular so we thought we would provide a brief recap for our readers. Starting way&#160;back in June, &#160;we presented&#160;Top Ten Tips and Tricks for [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cloudera.com/">Cloudera</a> produced several webinars in 2010 providing attendees with insights into a range of topics from technical best practices to common business applications of Hadoop. These webinars proved to be very popular so we thought we would provide a brief recap for our readers.</p>
<p>Starting way&#160;back in June, &#160;we presented&#160;<a href="http://www.cloudera.com/resource/top-ten-hadoop-tricks-and-tips-webinar"><em>Top Ten Tips and Tricks for Hadoop Success</em></a>. In this webinar we explained some tips that the Cloudera Solutions Architect team has picked up from implementing, deploying, and running Hadoop with our customers.</p>
<p><em><strong><a style="font-size: medium;" title="Top Ten Tips and Tricks" href="https://www1.gotomeeting.com/register/297991024">Top Ten Tips and Tricks for Hadoop Success</a></strong></em> <a title="Top Ten Tips and Tricks" href="https://www1.gotomeeting.com/register/297991024">(Link to video recording)</a></p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="name" value="__sse4393597" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=toptentipstricksforhadoopsuccessr9-100602152716-phpapp02&amp;stripped_title=top-ten-tips-tricks-for-hadoop-success-r9&amp;userName=cloudera" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=toptentipstricksforhadoopsuccessr9-100602152716-phpapp02&amp;stripped_title=top-ten-tips-tricks-for-hadoop-success-r9&amp;userName=cloudera" allowfullscreen="true" name="__sse4393597"></embed></object>
</p>
<p>In August, &#160;Jeff Hammerbacher presented&#160;<a href="http://www.cloudera.com/resource/10_common_hadoop-able_problems_webinar_2010_hammerbacher"><em>10 Common Hadoop-able Problems</em></a><em>.</em> This webinar highlighted ten&#160;<a href="http://www.cloudera.com/hadoop/">Hadoop</a> use cases deemed &#8220;Hadoop-able problems.&#8221; The use cases were distilled from Cloudera&#8217;s experience&#8217;s helping customers to solve their emerging data and business challenges.</p>
<p><strong><em><a style="font-size: medium;" title="10 Common Hadoop-able Problems Webinar" href="https://www1.gotomeeting.com/register/719074008">10 Common Hadoop-able Problems Webinar</a></em></strong><a title="10 Common Hadoop-able Problems Webinar" href="https://www1.gotomeeting.com/register/719074008"> (Link to video recording)</a></p>
<p><div id="__ss_4931616" style="width: 425px;">
<object id="__sse4931616" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20100806cloudera10hadoopableproblemswebinar-100809184633-phpapp02&amp;stripped_title=20100806-cloudera-10-hadoopable-problems-webinar-4931616&amp;userName=cloudera" /><param name="name" value="__sse4931616" /><param name="allowfullscreen" value="true" /><embed id="__sse4931616" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20100806cloudera10hadoopableproblemswebinar-100809184633-phpapp02&amp;stripped_title=20100806-cloudera-10-hadoopable-problems-webinar-4931616&amp;userName=cloudera" name="__sse4931616" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</div>
</p>
<p>In September, Cloudera&#8217;s&#160;<a href="http://www.cloudera.com/company/team/">Tom White</a>, author of&#160;<em>Hadoop: The Definitive Guide,</em> presented&#160;<a href="http://www.cloudera.com/videos/state-of-hadoop-tom-white"><em>The State of Hadoop</em></a> . As with most open source projects<span style="color: #505050;"> <a href="http://www.cloudera.com/hadoop/"><span style="color: #505050;">Hadoop</span></a></span> is continuously evolving and this webinar was an update on what was new, changed, and possible with Apache Hadoop.</p>
<p><strong><em><a style="font-size: medium;" title="The State of Hadoop" href="http://www.cloudera.com/videos/state-of-hadoop-tom-white">The State of Hadoop</a></em></strong> <a title="The State of Hadoop" href="http://www.cloudera.com/videos/state-of-hadoop-tom-white">(Link to Cloudera resources video version)</a></p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/2SpTvWiXBcA?fs=1&amp;hl=en_US" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/2SpTvWiXBcA?fs=1&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<p>In November, Anil Madan, Director of Engineering at eBay, and Cloudera&#8217;s Chief Scientist,&#160;<a href="http://www.cloudera.com/company/management/">Jeff Hammerbacher</a>, presented&#160;<a href="http://www.cloudera.com/resource/webinar-integrating-hadoop-data-warehouse-business-intelligence-environment"><em>Integrating Hadoop In Your Existing Data Warehouse and Business Intelligence Environment</em></a>. This webinar explained the standard model for business intelligence and data warehousing, the three stages of&#160;<a href="http://www.cloudera.com/"><span style="color: #505050;">Hadoop</span></a> adoption, integration with&#160;<a href="http://www.cloudera.com/partners/">Cloudera partnerships</a> and<span style="color: #505050;"> <a href="http://www.cloudera.com/"><span style="color: #505050;">Hadoop analytics</span></a></span> at eBay.</p>
<p><strong><em><a style="font-size: medium;" title="Integrating Hadoop In Your Existing Data Warehouse and Business Intelligence Environment" href="https://www1.gotomeeting.com/register/515000760">Integrating Hadoop In Your Existing Data Warehouse and Business Intelligence Environment</a></em></strong><a title="Integrating Hadoop In Your Existing Data Warehouse and Business Intelligence Environment" href="https://www1.gotomeeting.com/register/515000760"> (Link to video recording)</a></p>
<p>
<object id="__sse5828023" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20101117webinarfinalclouderahammerbacherebaymadan-101118160808-phpapp02&amp;stripped_title=20101117-webinar-final-cloudera-hammerbacher-e-bay-madan&amp;userName=cloudera" /><param name="name" value="__sse5828023" /><param name="allowfullscreen" value="true" /><embed id="__sse5828023" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20101117webinarfinalclouderahammerbacherebaymadan-101118160808-phpapp02&amp;stripped_title=20101117-webinar-final-cloudera-hammerbacher-e-bay-madan&amp;userName=cloudera" name="__sse5828023" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<p><a href="http://www.cloudera.com/">Cloudera&#8217;s</a> most recent webinar,&#160;<a href="http://www.cloudera.com/resource/webinar-production-hadoop-lessons-learned"><em>Production-izing Hadoop: Lessons Learned</em></a><em>, </em>presented on December 8<sup>th</sup> by Cloudera Solution Architect,&#160;<a href="http://www.cloudera.com/company/team/">Eric Sammer</a>, shared key insights for installing, configuring and running&#160;<a href="http://www.cloudera.com/hadoop/">Cloudera&#8217;s Distribution for Apache Hadoop</a> when scaling a cluster for full production use. These best practices were gained from the hundreds of Hadoop deployments that Cloudera has been involved with.</p>
<p><strong><em><a style="font-size: medium;" title="Production-izing Hadoop: Lessons Learned" href="https://www1.gotomeeting.com/register/617063296">Production-izing Hadoop: Lessons Learned</a></em></strong><a title="Production-izing Hadoop: Lessons Learned" href="https://www1.gotomeeting.com/register/617063296"> (Link to video recording)</a></p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=productionizinghadoop-lessonslearned-finalwebinar20101208-101208152104-phpapp02&amp;stripped_title=productionizing-hadoop-lessons-learned-final-webinar-20101208&amp;userName=cloudera" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=productionizinghadoop-lessonslearned-finalwebinar20101208-101208152104-phpapp02&amp;stripped_title=productionizing-hadoop-lessons-learned-final-webinar-20101208&amp;userName=cloudera" allowfullscreen="true"></embed></object>
</p>
<p>Cloudera will continue to provide more educational webinars in 2011. You can check our <a href="http://www.cloudera.com/company/events/">events<span style="color: #505050;"> </span></a><span style="color: #505050;"><a href="http://www.cloudera.com/company/events/"><span style="color: #505050;">page for upcoming activities</span></a> </span>or subscribe to our newsletter in the left side bar to stay informed on the latest from Cloudera. If there is a specific topic you would like to learn more about we&#8217;d love to hear from you. Please email your suggestions to&#160;<a href="mailto:community@cloudera.com">community@cloudera.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2011/01/2010-cloudera-apache-hadoop-webinars/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CDH2: &#8220;Testing&#8221; Heading Towards &#8220;Stable&#8221;</title>
		<link>http://www.cloudera.com/blog/2010/02/cdh2-testing-heading-towards-stable/</link>
		<comments>http://www.cloudera.com/blog/2010/02/cdh2-testing-heading-towards-stable/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 01:19:39 +0000</pubDate>
		<dc:creator>Chad Metcalf</dc:creator>
				<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=2601</guid>
		<description><![CDATA[In September 2009, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same &#8220;soak time&#8221; as our [...]]]></description>
			<content:encoded><![CDATA[<p>In September 2009, we announced <a href="http://www.cloudera.com/blog/2009/09/cdh2-clouderas-distribution-for-hadoop-2/">the first release of CDH2</a>, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same &#8220;soak time&#8221; as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable. It&#8217;s a long road of feedback, bug fixes, QA and testing to move from testing to stable. As someone who tracks the maturity of a testing build throughout its life cycle, I&#8217;m pleased to say we&#8217;ve put a lot of polish into this release.<br />
<span id="more-2601"></span>CDH2 has reached the point where we are preparing to promote it to stable. One might even call this a &#8220;release candidate&#8221;. Cloudera engineers have been hard at work getting patches into CDH2 to make it the best 0.20 release available. Here are some of the highlights:</p>
<ul>
<li>Hadoop 0.20.1 &#8211; <a href="http://archive.cloudera.com/cdh/testing/hadoop-0.20.1+169.56.CHANGES.txt">73 more patches of extra Hadoop&#8217;y goodness (that is 225 total patches over vanilla 0.20.1)</a></li>
<li>Lots of libhdfs and fusefs love resulting in stability and usability improvements currently in use at scale</li>
<li>HDFS fixes that improve the write pipeline</li>
<li>Lots of general stability fixes for Hadoop</li>
<li>Pig 0.5.0 release &#8211; <a href="http://archive.cloudera.com/cdh/testing/pig-0.5.0+11.1.CHANGES.txt">Working out of the box with our Hadoop 0.18 and 0.20 builds</a></li>
<li>Hive 0.4.1 release &#8211; <a href="http://archive.cloudera.com/cdh/testing/hive-0.4.1+14.4.CHANGES.txt">Works with both of our Hadoop 0.18 and 0.20</a></li>
<li>HBase 0.20.3 &#8211; We worked with the HBase team to bring the latest rpms to a <a href="http://archive.cloudera.com/redhat/cdh/cloudera-contrib.repo">yum repo</a> near you</li>
</ul>
<p>We are excited about our CDH2 release. Its running at scale at some really great companies. We are looking forward to promoting it to stable shortly and moving on to the next big thing, CDH3. I&#8217;ll let you know as soon as this happens. When CDH2 becomes stable, it also means that CDH3 is ready to start its journey through testing. Stay tuned for more details as to what CDH3 will encompass; I&#8217;ll just say that I&#8217;m pretty excited about it.</p>
<p>You can subscribe to our CDH mailing list (<a href="mailto:cdh-announce-subscribe@cloudera.com">cdh-announce-subscribe@cloudera.com</a>) to get information about new releases as we push them out. Check out the new release, and remember to let us know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2010/02/cdh2-testing-heading-towards-stable/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>CDH2: Testing Release now with Pig, Hive, and HBase</title>
		<link>http://www.cloudera.com/blog/2009/09/cdh2-testing-release-now-with-pig-hive-and-hbase/</link>
		<comments>http://www.cloudera.com/blog/2009/09/cdh2-testing-release-now-with-pig-hive-and-hbase/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 14:10:37 +0000</pubDate>
		<dc:creator>Chad Metcalf</dc:creator>
				<category><![CDATA[distribution]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1346</guid>
		<description><![CDATA[At the beginning of September, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same &#8220;soak time&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>At the beginning of September, we announced the <a href="http://www.cloudera.com/blog/2009/09/10/cdh2-clouderas-distribution-for-hadoop-2/">first release of CDH2</a>, our current <tt>testing</tt> repository. Packages in our <tt>testing</tt> repository are recommended for people who want more features and are willing to upgrade as bugs are worked out.  Our <tt>testing</tt> packages pass unit and functional tests but will not have the same &#8220;soak time&#8221; as our <tt>stable</tt> packages.  A <tt>testing</tt> release represents a work in progress that will eventually be promoted to <tt>stable</tt>.</p>
<p>We plan on pushing new packages into the <tt>testing</tt> repository every 3 to 6 weeks.&#160; And it just so happens it is just about 3 weeks after we announced the first testing release. So it must be time for a new one. Here are some of the highlights:</p>
<ul>
<li><strong>Hadoop 0.20.1</strong> &#8211; Bumps the hadoop package up to the <a href="http://hadoop.apache.org/common/docs/r0.20.1/changes.html">0.20.1 release</a> and adds 133 patches worth of <a href="http://archive.cloudera.com/cdh/testing/hadoop-0.20.1+120.CHANGES.txt">extra goodness</a></li>
<li><strong>Alternatives for Hadoop</strong> &#8211; Now you can have both 0.18 and 0.20 installed and use the alternatives system to pick a default</li>
<li><strong><strong>Pig</strong> 0.50 pre-release</strong> &#8211; We included some magic to get things working out of the box with both 0.18 and 0.20</li>
<li><strong>Hive 0.40 pre-release </strong>- Integrated with the alternatives setup out the box works with 0.18 and 0.20</li>
<li><strong>HBase 0.20</strong> &#8211; We worked with the HBase team to bring rpms to a <a href="http://archive.cloudera.com/redhat/cdh/cloudera-contrib.repo">yum repo</a> near you</li>
</ul>
<p>A project as large as Hadoop is a communal effort. Cloudera is proud to be part of that community and hope that our products and services make Hadoop even more accessible to a wider audience. We&#8217;d like to thank everyone who contributes to Hadoop, especially the Yahoo! team for all of their hard work on getting 0.20.1 released, the developers at Facebook and those working on Pig, Hive and HBase.</p>
<p>We are just getting the ball rolling here. You can subscribe to our CDH mailing list (<a href="mailto:cdh-announce-subscribe@cloudera.com" target="_blank">cdh-announce-subscribe@cloudera.com</a>) to get information about new releases as we push them out. Check out the new release, and remember to let us know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2009/09/cdh2-testing-release-now-with-pig-hive-and-hbase/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Advice on QA Testing Your MapReduce Jobs</title>
		<link>http://www.cloudera.com/blog/2009/07/advice-on-qa-testing-your-mapreduce-jobs/</link>
		<comments>http://www.cloudera.com/blog/2009/07/advice-on-qa-testing-your-mapreduce-jobs/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 20:33:19 +0000</pubDate>
		<dc:creator>Alex Loddengaard</dc:creator>
				<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=1040</guid>
		<description><![CDATA[As Hadoop adoption increases among organizations, companies, and individuals, and as it makes its way into production, testing MapReduce (MR) jobs becomes more and more important. By regularly running tests on your MR jobs&#8211;either invoked by developers before they commit a change or by a continuous integration server such as hudson&#8211;an engineering organization can catch [...]]]></description>
			<content:encoded><![CDATA[<p>As Hadoop adoption increases among organizations, companies, and individuals, and as it makes its way into production, testing MapReduce (MR) jobs becomes more and more important. By regularly running tests on your MR jobs&#8211;either invoked by developers before they commit a change or by a continuous integration server such as <a href="https://hudson.dev.java.net/">hudson</a>&#8211;an engineering organization can catch bugs early, strive for quality, and make developing and maintaining MR jobs easier and faster.</p>
<p>MR jobs are particularly difficult to test thoroughly because they run in a distributed environment.&#160; This post will give specific advice on how an engineering team might QA test its MR jobs. Note that Chapter 5 of <a href="http://oreilly.com/catalog/9780596521974/"><em>Hadoop: The Definitive Guide</em></a> gives specific code examples for testing an MR job.</p>
<p>As is the case with most testing scenarios, there are certain practices one can follow that have a low barrier to entry; such practices might do a fairly sufficient job of testing. There are also practices one can follow that are more complicated but perhaps result in more thorough testing. Let&#8217;s walk through some good QA practices, starting with the easiest and ending with the most complicated.</p>
<h2>Traditional Unit Tests &#8211; JUnit, PyUnit, Etc.</h2>
<p>Your MR job will probably have some functionality that can be tested in isolation using a unit-testing framework such as JUnit or PyUnit. For example, if your MR job does some document parsing in Java, your parse method can be tested using JUnit.</p>
<p>Using a traditional unit-testing framework is perhaps the easiest way to get started testing your MR jobs for a few reasons. First, they are already used by a huge collection of developers. Second, they can be invoked by and integrated into most popular continuous integration servers. Finally, they are simple, effective, and don&#8217;t require Hadoop daemons to be running.</p>
<p>Unit tests do a great job of testing each individual part of your MR job, but they do not test your MR job as a whole, and they do not test your MR job within Hadoop.</p>
<p>To start using traditional unit tests to improve the quality of your MR jobs, simply write your map and reduce functions in such a way that functionality is extracted from these functions into &#8220;private&#8221; helper functions (or classes), which can be tested in isolation. For example, put all your parse code in a &#8220;parse&#8221; method. Then, choose a unit-testing framework that fits your use case.</p>
<p>Most Python unit testing uses PyUnit, and most Java unit testing uses either JUnit or TestNG. Most popular programming languages have a standard unit-testing tool, so do some research to learn which framework is best for your needs. Finally, write tests in the framework of choice that thoroughly test each helper function you&#8217;ve defined for your map and reduce functions. Unit tests can then be run either on the local machine or on a continuous integration to ensure that the post conditions of the tested helper functions are met for a particular input.</p>
<h2>MRUnit &#8211; Unit Testing for MR Jobs</h2>
<p>MRUnit is a tool that was developed here at Cloudera and released back to the Apache Hadoop project. It can be used to unit-test map and reduce functions. MRUnit lets you define key-value pairs to be given to map and reduce functions, and it tests that the correct key-value pairs are emitted from each of these functions. MRUnit tests are similar to traditional unit tests in that they are simple, isolated, and don&#8217;t require Hadoop daemons to be running. Aaron Kimball has written a very detailed blog post about MRUnit <a href="http://www.cloudera.com/blog/2009/07/03/debugging-mapreduce-programs-with-mrunit/">here</a>.</p>
<h2>Local Job Runner Testing &#8211; Running MR Jobs on a Single Machine in a Single JVM</h2>
<p>Traditional unit tests and MRUnit should do a fairly sufficient job detecting bugs early, but neither will test your MR jobs with Hadoop. The local job runner lets you run Hadoop on a local machine, in one JVM, making MR jobs a little easier to debug in the case of a job failing.</p>
<p>To enable the local job runner, set &#8220;mapred.job.tracker&#8221; to &#8220;local&#8221; and &#8220;fs.default.name&#8221; to &#8220;file:///some/local/path&#8221; (these are the default values).</p>
<p>Remember, there is no need to start any Hadoop daemons when using the local job runner. Running <em>bin/hadoop</em> will start a JVM and will run your job for you. Creating a new hadoop-local.xml file (or mapred-local.xml and hdfs-local.xml if you&#8217;re using 0.20) probably makes sense. You can then use the <em>&#8211;config</em> parameter to tell <em>bin/hadoop</em> which configuration directory to use. If you&#8217;d rather avoid fiddling with configuration files, you can create a class that implements <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html">Tool</a> and uses <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/ToolRunner.html">ToolRunner</a>, and then run this class with <em>bin/hadoop jar foo.jar com.example.Bar -D mapred.job.tracker=local -D <a href="http://fs.default.name/" target="_blank">fs.default.name</a>=file:/// (args)</em>, where <em>Bar</em> is the Tool implementation.</p>
<p>To start using the local job runner to test your MR jobs in Hadoop, create a new configuration directory that is local job runner enabled and invoke your job as you normally would, remembering to include the <em>&#8211;config</em> parameter, which points to a directory containing your local configuration files.</p>
<p>The <em>-conf</em> parameter also works in 0.18.3 and lets you specify your hadoop-local.xml file instead of specifying a directory with <em>&#8211;config</em>. Hadoop will run the job happily. The difficulty with this form of testing is verifying that the job ran correctly. Note: you&#8217;ll have to ensure that input files are set up correctly and output directories don&#8217;t exist before running the job.</p>
<p>Assuming you&#8217;ve managed to configure the local job runner and get a job running, you&#8217;ll have to verify that your job completed correctly. Simply basing success on exit codes isn&#8217;t quite good enough. At the very least, you&#8217;ll want to verify that the output of your job is correct. You may also want to scan the output of <em>bin/hadoop</em> for exceptions. You should create a script or unit test that sets up preconditions, runs the job, diffs actual output and expected output, and scans for raised exceptions. This script or unit test can then exit with the appropriate status and output specific messages explaining how the job failed.</p>
<p>Note that the local job runner has a couple of limitations: only one reducer is supported, and the <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html">DistributedCache</a> doesn&#8217;t work (<a href="https://issues.apache.org/jira/browse/MAPREDUCE-476">a fix is in progress</a>).</p>
<h2>Pseudo-distributed Testing &#8211; Running MR Jobs on a Single Machine Using Daemons</h2>
<p>The local job runner lets you run your job in a single thread. Running an MR job in a single thread is useful for debugging, but it doesn&#8217;t properly simulate a real cluster with several Hadoop daemons running (<em>e.g.,</em> NameNode, DataNode, TaskTracker, JobTracker, SecondaryNameNode). A pseudo-distributed cluster is composed of a single machine running all Hadoop daemons. This cluster is still relatively easy to manage (though harder than local job runner) and tests integration with Hadoop better than the local job runner does.</p>
<p>To start using a pseudo-distributed cluster to test your MR jobs in Hadoop, follow the aforementioned advice for using the local job runner, but in your precondition setup include the configuration and start-up of all Hadoop daemons. Then, to start your job, just use <em>bin/hadoop</em> as you would normally.</p>
<h2>Full Integration Testing &#8211; Running MR Jobs on a QA Cluster</h2>
<p>Probably the most thorough yet most cumbersome mechanism for testing your MR jobs is to run them on a QA cluster composed of at least a few machines. By running your MR jobs on a QA cluster, you&#8217;ll be testing all aspects of both your job and its integration with Hadoop.</p>
<p>Running your jobs on a QA cluster has many of the same issues as the local job runner. Namely, you&#8217;ll have to check the output of your job for correctness. You may also want to scan the <em>stdin</em> and <em>stdout</em> produced by each task attempt, which will require collecting these logs to a central place and grepping them. <a href="http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/">Scribe</a> is a useful tool for collecting logs, though it may be superfluous depending on your QA cluster.</p>
<p>We find that most of our customers have some sort of QA or development cluster where they can deploy and test new jobs, try out newer versions of Hadoop, and practice upgrading clusters from one version of Hadoop to another. If Hadoop is a major part of your production pipeline, then creating a QA or development cluster makes a lot of sense, and repeatedly running jobs on it will ensure that changes to your jobs continue to get tested thoroughly. EC2 may be a good host for your QA cluster, as you can bring it up and down on demand. Take a look at our beta <a href="http://www.cloudera.com/hadoop-ec2-ebs-beta">EC2 EBS Hadoop scripts</a> if you&#8217;re interested in creating a QA cluster in EC2.</p>
<p>You should choose QA practices based on the importance of QA for your organization and also on the amount of resources you have. Simply using a traditional unit-testing framework, MRUnit and the local job runner can test your MR jobs thoroughly in a simple way without using too many resources. However, running your jobs on a QA or development cluster is naturally the best way to fully test your MR jobs with the expenses and operational tasks of a Hadoop cluster.</p>
<p>Do you have any helpful advice on beneficial QA practices for MR jobs?&#160; Leave a comment :).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2009/07/advice-on-qa-testing-your-mapreduce-jobs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

