<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cloudera &#187; ZooKeeper</title>
	<atom:link href="http://www.cloudera.com/blog/category/zookeeper/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cloudera.com</link>
	<description>Hadoop and Cloudera&#039;s Products and Services</description>
	<lastBuildDate>Thu, 24 May 2012 17:53:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Apache ZooKeeper 3.3.5 has been released</title>
		<link>http://www.cloudera.com/blog/2012/03/apache-zookeeper-3-3-5-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2012/03/apache-zookeeper-3-3-5-has-been-released/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 18:39:46 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[apache zookeeper]]></category>
		<category><![CDATA[zookeeper release]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=13759</guid>
		<description><![CDATA[Apache ZooKeeper release 3.3.5 is now available. This is a bug fix release covering 11 issues, two of which were considered blockers. Some of the more serious issues include: ZOOKEEPER-1367 Data inconsistencies and unexpired ephemeral nodes after cluster restart ZOOKEEPER-1412 Java client watches inconsistently triggered on reconnect ZOOKEEPER-1277 Servers stop serving when lower 32bits of zxid roll over ZOOKEEPER-1309 Creating [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Apache ZooKeeper" href="http://zookeeper.apache.org/">Apache ZooKeeper</a> release 3.3.5 is now available. This is a bug fix release covering <a href="http://bit.ly/GzD9lB">11</a> issues, two of which were considered blockers. Some of the more serious issues include:</p>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1367" target="_blank">ZOOKEEPER-1367</a> Data inconsistencies and unexpired ephemeral nodes after cluster restart</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1412" target="_blank">ZOOKEEPER-1412</a> Java client watches inconsistently triggered on reconnect</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1277" target="_blank">ZOOKEEPER-1277</a> Servers stop serving when lower 32bits of zxid roll over</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1309" target="_blank">ZOOKEEPER-1309</a> Creating a new ZooKeeper client can leak file handles</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1389" target="_blank">ZOOKEEPER-1389</a> It would be nice if start-foreground used exec $JAVA in order to get rid of the intermediate shell process</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1089" target="_blank">ZOOKEEPER-1089</a> zkServer.sh status does not work due to invalid option of nc</li>
</ul>
<h2>Stability, Compatibility and Testing</h2>
<p>3.3.5 is a stable release that&#8217;s fully backward compatible with 3.3.4. Only bug fixes relative to 3.3.4 have been applied. Version 3.3.5 will be incorporated into the upcoming CDH3U4 release.</p>
<h2>Getting Involved</h2>
<p>The Apache ZooKeeper project is working on a number of new features. Our <a title="how to contribute to ZooKeeper" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute" target="_blank">How To Contribute</a> page is a great place to start if you&#8217;re interested in getting involved as a developer. You can also <a title="@phunt" href="https://twitter.com/#!/phunt">follow me on twitter</a>.</p>
<h2>Acknowledgements</h2>
<p>A special thanks to everyone who contributed to the release (reporting issues, fixing bugs, reviewing changes, writing documentation, etc).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2012/03/apache-zookeeper-3-3-5-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache ZooKeeper 3.4.3 has been released</title>
		<link>http://www.cloudera.com/blog/2012/02/apache-zookeeper-3-4-3-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2012/02/apache-zookeeper-3-4-3-has-been-released/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 21:12:33 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[apache zookeeper]]></category>
		<category><![CDATA[zookeeper release]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=10870</guid>
		<description><![CDATA[Apache ZooKeeper release 3.4.3 is now available. This is a bug fix release covering 18 issues, one of which was considered a blocker.  ZooKeeper 3.4 is incorporated into CDH4 and now available in beta 1! ZOOKEEPER-1367 is the most serious of the issues addressed, it could cause data corruption on restart. This version also adds support for compiling [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Apache ZooKeeper" href="http://zookeeper.apache.org/">Apache ZooKeeper</a> release 3.4.3 is now available. This is a bug fix release covering <a href="http://bit.ly/zkY1JF">18</a> issues, one of which was considered a blocker. </p>
<p>ZooKeeper 3.4 is incorporated into CDH4 and <a href="http://bit.ly/zcoWXX">now available in beta 1</a>!</p>
<p>ZOOKEEPER-1367 is the most serious of the issues addressed, it could cause data corruption on restart. This version also adds support for compiling the client on ARM architectures.</p>
<ul style="padding-left:20px, padding-bottom:10px">
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1367">ZOOKEEPER-1367</a>  Data inconsistencies and unexpired ephemeral nodes after cluster restart</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1343">ZOOKEEPER-1343</a>  getEpochToPropose should check if lastAcceptedEpoch is greater or equal than epoch</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1373">ZOOKEEPER-1373</a>  Hardcoded SASL login context name clashes with Hadoop security configuration override</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1089">ZOOKEEPER-1089</a>  zkServer.sh status does not work due to invalid option of nc</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-973">ZOOKEEPER-973</a>    bind() could fail on Leader because it does not setReuseAddress on its ServerSocket</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1374">ZOOKEEPER-1374</a>  C client multi-threaded test suite fails to compile on ARM architectures.</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1348">ZOOKEEPER-1348</a>  Zookeeper 3.4.2 C client incorrectly reports string version of 3.4.1</li>
</ul>
<p>If you are running 3.4.2 or earlier, be sure to upgrade immediately. See my earlier post for details on <a href="http://bit.ly/rB0ROX">what&#8217;s new in 3.4</a>.</p>
<h2>Stability, Compatibility and Testing</h2>
<p>The 3.4 series has been through a number of releases, incorporating feedback from users and addressing found issues. The Apache community is now considering the 3.4.3 release to be of beta quality. </p>
<h2>Getting Involved</h2>
<p>The Apache ZooKeeper project is working on a number of new features. Our <a title="how to contribute to ZooKeeper" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute">How To Contribute</a> page is a great place to start if you&#8217;re interested in getting involved as a developer. You can also <a title="@phunt" href="https://twitter.com/#!/phunt">follow me on twitter</a>.</p>
<h2>Acknowledgements</h2>
<p>A special thanks to everyone who contributed to the release (reporting issues, fixing bugs, reviewing changes, writing documentation, etc).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2012/02/apache-zookeeper-3-4-3-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache ZooKeeper 3.4.2 has been released</title>
		<link>http://www.cloudera.com/blog/2011/12/apache-zookeeper-3-4-2-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2011/12/apache-zookeeper-3-4-2-has-been-released/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 21:55:33 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[apache zookeeper]]></category>
		<category><![CDATA[zookeeper release]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=10149</guid>
		<description><![CDATA[Apache ZooKeeper release&#160;3.4.2 is now available. This is a bug fix release covering 2 issues, one of which was considered a blocker. ZOOKEEPER-1333 is the most serious of the issues addressed, it could cause a server to fail to rejoin the quorum on restart: ZOOKEEPER-1333 NPE in FileTxnSnapLog when restarting a cluster If you are [...]]]></description>
			<content:encoded><![CDATA[<div>
<div>
<p><a title="Apache ZooKeeper" href="http://zookeeper.apache.org/">Apache ZooKeeper</a> release&#160;3.4.2 is now available. This is a bug fix release covering <a href="http://bit.ly/tR1W9H">2</a> issues, one of which was considered a blocker.</p>
<p>ZOOKEEPER-1333 is the most serious of the issues addressed, it could cause a server to fail to rejoin the quorum on restart:</p>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1333">ZOOKEEPER-1333</a> NPE in FileTxnSnapLog when restarting a cluster</li>
</ul>
<p>If you are running 3.4.1 or earlier be sure to upgrade immediately. See my earlier post for details on <a href="http://bit.ly/rB0ROX">what&#8217;s new in 3.4</a>.</p>
<ul>
</ul>
<h2>Stability, Compatibility and Testing</h2>
<p>It is important to note that 3.4.2 is not yet ready for production. It is an early release that users can start testing so that we can stabilize later 3.4.x releases. &#160;We expect a dot release to be production-ready soon, and to be incorporated into CDH4.</p>
<h2>Getting Involved</h2>
<p>The Apache ZooKeeper project is working on a number of new features. Our&#160;<a title="how to contribute to ZooKeeper" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute">How To Contribute</a> page is a great place to start if you&#8217;re interested in getting involved as a developer. You can also&#160;<a title="@phunt" href="https://twitter.com/#!/phunt">follow me on twitter</a>.</p>
<h2>Acknowledgements</h2>
<p>A special thanks to everyone who contributed to the release (reporting issues, fixing bugs, reviewing changes, writing documentation, etc).</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2011/12/apache-zookeeper-3-4-2-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache ZooKeeper 3.3.4 has been released</title>
		<link>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-3-4-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-3-4-has-been-released/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 19:34:18 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[apache zookeeper]]></category>
		<category><![CDATA[zookeeper release]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=9484</guid>
		<description><![CDATA[Apache ZooKeeper release&#160;3.3.4&#160;is now available: this is a fix release covering 22 issues, 9 of which were considered blockers. Some of the more serious issues include: ZOOKEEPER-1208 Ephemeral nodes may not be removed after the client session is invalidated ZOOKEEPER-961 Watch recovery fails after disconnection when using chroot connection option ZOOKEEPER-1049 Session expire/close flooding renders [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Apache ZooKeeper" href="http://zookeeper.apache.org/">Apache ZooKeeper</a> release&#160;3.3.4&#160;is now available: this is a fix release covering <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&amp;version=12316276">22 issues</a>, 9 of which were considered blockers. Some of the more serious issues include:</p>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1208">ZOOKEEPER-1208</a> Ephemeral nodes may not be removed after the client session is invalidated</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-961">ZOOKEEPER-961</a> Watch recovery fails after disconnection when using chroot connection option</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1049">ZOOKEEPER-1049</a> Session expire/close flooding renders heartbeats to delay significantly</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1156">ZOOKEEPER-1156</a> Log truncation truncating log too much &#8211; can cause data loss</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1046">ZOOKEEPER-1046</a> Creating a new sequential node incorrectly results in a ZNODEEXISTS error</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1097">ZOOKEEPER-1097</a> Quota is not correctly rehydrated on snapshot reload</li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1117">ZOOKEEPER-1117</a> zookeeper 3.3.3 fails to build with gcc >= 4.6.1 on Debian/Ubuntu</li>
</ul>
<h2>Stability, Compatibility and Testing</h2>
<p>3.3.4 is a stable release that&#8217;s fully backward compatible with 3.3.3. Only bug fixes relative to 3.3.3 have been applied. Version 3.3.4 will be incorporated into the upcoming CDH3U3 release.</p>
<p>Note that these changes are included in the recent 3.4.0 release <a href="http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-4-0-has-been-released/">which I detailed earlier</a>.</p>
<h2>Getting Involved</h2>
<p>The Apache ZooKeeper project is working on a number of new features. Our&#160;<a title="how to contribute to ZooKeeper" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute">How To Contribute</a> page is a great place to start if you&#8217;re interested in getting involved as a developer. You can also&#160;<a title="@phunt" href="https://twitter.com/#!/phunt">follow me on twitter</a>.</p>
<h2>Acknowledgements</h2>
<p>A special thanks to everyone who contributed to the release (reporting issues, fixing bugs, reviewing changes, writing documentation, etc).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-3-4-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache ZooKeeper 3.4.0 has been released</title>
		<link>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-4-0-has-been-released/</link>
		<comments>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-4-0-has-been-released/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 20:35:50 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[apache zookeeper]]></category>
		<category><![CDATA[zookeeper release]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=9482</guid>
		<description><![CDATA[Apache ZooKeeper release 3.4.0 is now available: it includes changes covering over&#160;150 issues, 27 of which were considered blockers. ZooKeeper 3.3.3 clients are compatible with 3.4.0 servers, enabling a seamless upgrade path (3.4.0 clients with 3.3.3 servers has also been tested successfully). In addition to improving overall stability some of the highlights are described below: [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Apache ZooKeeper" href="http://zookeeper.apache.org/">Apache ZooKeeper</a> release <a title="ZooKeeper release 3.4.0" href="http://zookeeper.apache.org/doc/r3.4.0/">3.4.0</a> is now available: it includes changes covering over&#160;<a title="release notes for 3.4.0" href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&amp;version=12314469">150 issues</a>, 27 of which were considered blockers. ZooKeeper 3.3.3 clients are compatible with 3.4.0 servers, enabling a seamless upgrade path (3.4.0 clients with 3.3.3 servers has also been tested successfully). In addition to improving overall stability some of the highlights are described below:</p>
<h2>Things of interest to developers implementing ZooKeeper clients:</h2>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-992">Native Windows version of C client</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-938">Support Kerberos authentication of clients</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-965">Multi-update client API</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-809">Improved REST Interface</a></li>
</ul>
<p>The native windows support allows visual studio users to&#160;now compile a native client. This will increase ZooKeeper use substantially on that platform, in particular enabling client bindings other than just Java (on Unix the c binding is used to provide a <a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings">number of bindings to various languages</a> such as python, perl, ruby, go, etc&#8230;).</p>
<p>Kerberos authentication was initially developed in ZooKeeper to support the efforts going on in HBase to enable security for those users. However Kerberos client based authentication is now available to anyone using ZooKeeper.</p>
<p>Multi-update is one of the only changes in recent history to extend the ZooKeeper client interface. It allows for multiple operations to be &#8220;batched&#8221; together as a single atomic operation that either succeeds or fails in its entirety. This feature will greatly simplify implementation of certain domain specific (i.e. client side) business logic.</p>
<h2>Improved Operational support:</h2>
<p>Operations loves ZooKeeper because of its resilience to failure and self-recovery features, here are some of the new features available in 3.4.0:</p>
<ul>
<li>Existing monitoring support has been extended through the introduction of a new&#160;<a href="https://issues.apache.org/jira/browse/ZOOKEEPER-744">&#8216;mntr&#8217; 4 letter word</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-799">Add tools and recipes for monitoring as a contrib</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-808"> Web-based Administrative Interface</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1107">Automating log and snapshot cleaning</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1239">Add logging/stats to identify production deployment issues</a></li>
<li><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-999">Support for building RPM and DEB packages</a></li>
</ul>
<p>A number of contributions were accepted around enabling improved monitoring of the system. In particular the new monitoring 4 letter word and the Ganglia/Nagios integration that now ship with the release will go a long way in helping operations support ZooKeeper in production.</p>
<h2>Stability, Compatibility and Testing</h2>
<p>It is important to note that 3.4.0 is not yet ready for production. It is an early release that users can start testing so that we can stabilize later 3.4.x releases.&#160; We expect a later dot release to be production-ready soon, and will be incorporated into CDH4.</p>
<h2>Getting Involved</h2>
<p>The Apache ZooKeeper project is working on a number of new features. Our <a title="how to contribute to ZooKeeper" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute">How To Contribute</a> page is a great place to start if you&#8217;re interested in getting involved as a developer. You can also <a title="@phunt" href="https://twitter.com/#!/phunt">follow me on twitter</a>.</p>
<h2>Acknowledgements</h2>
<p>A special thanks to everyone who contributed to the release (reporting issues, fixing bugs, reviewing changes, writing documentation, etc), and big thank you to our amazing release manager Mahadev Konar.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2011/11/apache-zookeeper-3-4-0-has-been-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hadoop World 2011: A Glimpse into Development</title>
		<link>http://www.cloudera.com/blog/2011/10/hadoop-world-2011-a-glimpse-into-development/</link>
		<comments>http://www.cloudera.com/blog/2011/10/hadoop-world-2011-a-glimpse-into-development/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 13:00:42 +0000</pubDate>
		<dc:creator>Jon Zuanich</dc:creator>
				<category><![CDATA[Avro]]></category>
		<category><![CDATA[careers]]></category>
		<category><![CDATA[CDH]]></category>
		<category><![CDATA[Cloudera's Service and Configuration Manager]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[Connector]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[Flume]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[oozie]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[sqoop]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[Use Case]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[hadoop conference]]></category>
		<category><![CDATA[hadoop event]]></category>
		<category><![CDATA[hadoop world]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=9240</guid>
		<description><![CDATA[The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.hadoopworld.com/"><img style="float: left; padding-right: 20px;" title="Register for Hadoop World" src="https://www.cloudera.com/wp-content/uploads/2010/08/hw_white2.gif" alt="" /></a></p>
<p>The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools, performance, bringing the stack together and testing the stack. Sessions in this track are for developers of all levels who want to learn more about upcoming features and enhancements, new tools, advanced techniques and best practices.</p>
<h2 style="font-size: 14pt; color: #344152;"><a href="http://www.hadoopworld.com/tracks/development-developers/" target="_blank">Preview of Development Track Sessions</a></h2>
<p><a href="http://www.hadoopworld.com/sessions/" target="_blank"><span style="color: #4aa02c; font-weight: bold; font-size: 12pt;">Building Web Analytics Processing on Hadoop at CBS Interactive</span></a><br />
 <em>Michael Sun, CBS Interactive</em></p>
<p><strong>Abstract:</strong> CBS Interactive successfully adopted Hadoop as the web analytics platform, processing one Billion weblogs daily from hundreds of web site properties that CBS Interactive oversees. After introducing Lumberjack&#8212;the Extraction, Transformation and Loading framework we built based on python and streaming, which is under review for Open-Source release&#8212;Michael will talk about web metrics processing on Hadoop, focusing on weblog harvesting, parsing, dimension look-up, sessionization, and loading into a database. Since migrating processing from a proprietary platform to Hadoop, CBS Interactive achieved robustness, fault-tolerance and scalability, and significant reduction of processing time to reach SLA (over six hours reduction so far).</p>
<p><a href="http://www.hadoopworld.com/sessions/" target="_blank"><span style="color: #4aa02c; font-weight: bold; font-size: 12pt;">Gateway: Cluster Virtualization Framework</span></a><br />
<em>Konstantin Shvachko, eBay</em></p>
<p><strong>Abstract:</strong> Access to Hadoop clusters through dedicated portal nodes (typically located behind firewalls and performing user authentication and authorization) can have several drawbacks &#8212; as shared multitenant resources they can create contention among users and increase the maintenance overhead for cluster administrators. This session will discuss the Gateway system, a cluster virtualization framework that provides multiple benefits: seamless access from users&#8217; workplace computers through corporate firewalls; the ability to failover to active clusters for scheduled or unscheduled downtime, as well as the ability to redirect traffic to other clusters during upgrades; and user access to clusters running different versions of Hadoop. </p>
<p><a href="http://www.hadoopworld.com/sessions/" target="_blank"><span style="color: #4aa02c; font-weight: bold; font-size: 12pt;">SHERPASURFING &#8211; Open Source Cyber Security Solution</span></a><br />
<em>Wayne Wheeles, Novii Design</em></p>
<p><strong>Abstract:</strong> Every day billions of packets, both benign and some malicious, flow in and out of networks. Every day it is an essential task for the modern Defensive Cyber Security Organization to be able to reliably survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, correlate it and perform. SHERPASURFING is an open source platform built on the proven Cloudera&#8217;s Distribution including Apache Hadoop that enables organizations to perform the Cyber Security mission and at scale at an affordable price point. This session will include an overview of the solution and components, followed by a demonstration of analytics. </p>
<p><a href="http://www.hadoopworld.com/sessions/" target="_blank"><span style="color: #4aa02c; font-weight: bold; font-size: 12pt;">Integrating Hadoop with Enterprise RDBMS Using Apache SQOOP and Other Tools</span></a><br />
<em>Arvind Prabhakar, Cloudera<br />
Guy Harrison, Quest Software</em></p>
<p><strong>Abstract:</strong> As Hadoop graduates from pilot project to a mission critical component of the enterprise IT infrastructure, integrating information held in Hadoop and in Enterprise RDBMS becomes imperative. We&#8217;ll look at key scenarios driving Hadoop and RDBMS integration and review technical options. In particular, we&#8217;ll deep dive into the Apache SQOOP project, which expedites data movement between Hadoop and any JDBC database, as well as providing an framework which allows developers and vendors to create connectors optimized for specific targets such as Oracle, Netezza etc. </p>
<p><a href="http://www.hadoopworld.com/sessions/" target="_blank"><span style="color: #4aa02c; font-weight: bold; font-size: 12pt;">Next Generation Apache Hadoop MapReduce</span></a><br />
<em>Mahadev Konar, Hortonworks</em></p>
<p><strong>Abstract:</strong> The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale, high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization. We will be presenting the architecture and design of the next generation of map reduce and will delve into the details of the architecture that makes it much easier to innovate. We will also be presenting large scale and small scale comparisons on some benchmarks with MRV1.&#8221; </p>
<p><a href="http://www.hadoopworld.com/"><img title="Register for Hadoop World" src="https://www.cloudera.com/wp-content/uploads/2010/12/registernow.gif" alt="Register for Hadoop World" /></a></p>
<p>There are several <a href="http://www.hadoopworld.com/training/">training classes</a> and <a href="http://www.hadoopworld.com/training/">certification sessions</a> provided surrounding the Hadoop World conference. Don&#8217;t forget to register and become <a href="http://www.hadoopworld.com/training/">Cloudera Certified in Apache Hadoop</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2011/10/hadoop-world-2011-a-glimpse-into-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop/HBase Capacity Planning</title>
		<link>http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/</link>
		<comments>http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/#comments</comments>
		<pubDate>Tue, 17 Aug 2010 18:43:11 +0000</pubDate>
		<dc:creator>Alex Kozlov</dc:creator>
				<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[sizing]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=4324</guid>
		<description><![CDATA[Hadoop and HBase are gaining popularity due to their flexibility and tremendous work that has been done to simplify their installation and use.  This blog is to provide guidance in sizing your first Hadoop/HBase cluster.  First, there are significant differences in Hadoop and HBase usage.  Hadoop MapReduce is primarily an analytic tool to run analytic [...]]]></description>
			<content:encoded><![CDATA[<p>Hadoop and HBase are gaining popularity due to their flexibility and tremendous work that has been done to simplify their installation and use.  This blog is to provide guidance in sizing your first Hadoop/HBase cluster.  First, there are significant differences in Hadoop and HBase usage.  Hadoop MapReduce is primarily an analytic tool to run analytic and data extraction queries over <em>all of your data</em>, or at least a significant portion of them (data is a plural of datum).  HBase is much better for real-time <em>read/write/modify access to tabular data</em>.  Both applications are designed for high concurrency and large data sizes.  For a general discussions about Hadoop/HBase architecture and differences please refer to Cloudera, Inc. [<a href="https://wiki.cloudera.com/display/DOC/Hadoop+Installation+Documentation+for+Cloudera+Enterprise" target="_blank">https://wiki.cloudera.com/display/DOC/Hadoop+Installation+Documentation+for+Cloudera+Enterprise</a><em>, </em><a href="http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-hbase" target="_blank">http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-hbase</a>], or Lars George blogs [<a href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html" target="_blank">http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html</a>].  We expect a new edition of the Tom White&#8217;s Hadoop book [<a href="http://www.hadoopbook.com" target="_blank">http://www.hadoopbook.com</a>] and a new HBase book in the near future as well.</p>
<div>Hadoop core is a file system, called HDFS, and the actual MapReduce implementation that can be used to compute on top of the HDFS.  Since we are talking about data, the first crucial parameter is how much disk space we need on all of the Hadoop nodes to store all of your data and what compression algorithm you are going to use to store the data.  For the MapReduce components an important consideration is how much computational power you need to process the data and whether the jobs you are going to run on the cluster is CPU or I/O intensive.  An example of a CPU intensive job is image processing while an I/O intensive job is a simple data loading or aggregation.  Finally, HBase is mainly memory driven and we need to consider the data access pattern in your application and how much memory you need so that the HBase nodes do not swap the data too often to the disk.  Most of the written data end up in memstores before they finally end up on disk, so you should plan for more memory in write-intensive workloads like web crawling.  A good application for HBase is a low latency key-based retrieval and storage of semi-structured data like web crawls or dimensional data for joining with a DW fact table, particularly if the data need update time tracking and can be easily grouped into column families.</div>
<div>General Cloudera hardware recommendations are given <a href="http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations" target="_blank">here</a>.  This blog will focus on more detailed capacity planning issues.</div>
<h3>Network</h3>
<div>While the subject of network latency, throughput and bandwidth is very often overlooked when starting to work with Hadoop, it is bound to become a limiting factor as your cluster grows.  Each node in a Hadoop cluster needs to be able to communicate with each other with low latency and high throughput at least to grab the relevant data.  Besides, if the the nodes are not able to communicate with the master node, the master node will automatically think that they are dead and delist them, which will lead to an increased load on the rest of the nodes.  Hadoop will work with off-the-shelf TCP/IP network.</div>
<div>Network load depends on the nature of analytical computations in the cluster.  One simple application that requires a lot of communication between nodes is <a href="http://sortbenchmark.org" target="_blank">sorting</a>.  In fact, <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html" target="_blank">TeraSort</a> is a good test to detect network issues in the cluster.</div>
<div>A typical configuration is to organize the nodes into racks with a 1GE Top Of Rack (TOR) switch. The racks are typically interconnected by one or more low-latency high-throughput dedicated Layer-2 10GE core switches.  Many customers are happy with ~40 node clusters that can fit onto one rack with a typical 48-port switch.  Even if all of your nodes can fit into one rack but you plan to scale beyond one rack, Cloudera recommends to go with at least two racks from the start to enforce proper practices and network topology scripting.</div>
<div>Network problems can manifest themselves indirectly.  A good practical test is to run a network intensive application like terasort, which sorts 10B 100 byte records (the specific parameters can be adjusted to your cluster size),  on your cluster.  On a 100-node cluster with a quad dual-core CPU hardware the running time should be roughly within 10 minutes (one of our customers sorted 1TB in 6 minutes on a 76-node cluster, the numbers are likely to go down with new 12-core CPU machines).  If you see &#8220;Bad connect ack with firstBadLink&#8221;, &#8220;Bad connect ack&#8221;, &#8220;No route to host&#8221;, or &#8220;Could not obtain block&#8221; IO exceptions under heavy loads, chances are these are due to a bad network.  Even one slow network card on one of the nodes can slow total job execution as much as a factor of 3-4 since the job completion is limited by the the slowest task.  This problems can also manifest themselves as &#8216;intermittent&#8217; under heavy loads, but usually go away with proper network configuration and tuning.</div>
<div>Network connection to outside systems is important for loading data into the HDFS and interoperability.  Some companies prefer to have a dedicated high-bandwidth network for loading the data (as opposed to just using VLAN).</div>
<h3><strong>Memory</strong></h3>
<div>HBase is a very memory hungry application.  Each node in HBase installation, called RegionServer, keeps a number of regions, or chunks of your data, in memory (if caching is enabled).  Ideally, the whole table would be kept in memory but this is not possible with a TB dataset.  Typically, a single RS can handle a few 100s of regions with each 1 or 2GBs (these are configurable parameters).  The number of HBase nodes and memory requirements should be planned accordingly.  From our experience, the memory requirement is at least 4GB/RS for any decent load, but depends significantly on your application load and access pattern.</div>
<div>For Hadoop MapReduce, you want to allocate somewhere between 1GB and 2GB of memory per task on top of the memory allocated for HBase for large clusters:  As the cluster grows, you should plan for a slight overhead in both the tasks memory and the number of simultaneously opened tasktracker connections, controlled by <em>tasktracker.http.threads </em><em>and mapred.reduce.</em><em>parallel</em><em>.</em><em>copies</em>, to be able to serve more node-to-node connections.</div>
<div>Both Hadoop and HBase memory problems will manifest in slowness of the whole system since both systems were not designed to rely on swapping.  It is recommended to discourage swapping on HBase nodes (set <em>vm.swappiness</em> to 0 or 5 in <em>/etc/sysctl.conf</em>) and to enable GC logging (add &#8220;<em><em>-Xloggc:/var/log/hbase/gc-hbase.log -verbose:gc </em>-XX:+PrintGC <em>-XX:+PrintGCDetails -XX:+PrintGCTimeStamps</em> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime</em> &#8221; to the JVM opts) to look for large GC pauses in the log.  GC pauses longer than 60 seconds can cause RS to go offline (even worse problems can occur if you run a ZK on the same node and it becomes unresponsive), but pauses as long as 1 second usually lead to noticeable responsiveness problems.  For HBase daemons, RS and ZK, Cloudera also recommends to switch to CMS GC (add &#8220;<em>-XX:+UseConcMarkSweepGC -XX:-CMSIncrementalMode</em>&#8221; to the JVM opts).  There is also work to develop <a href="http://www.managedruntime.org" target="_blank">pauseless JVMs</a>.</div>
<div>If a Hadoop node is running an HBase RS daemon together with a Hadoop TT daemon, Cloudera recommends to reduce the maximum number of map/reduce tasks via configuring  <em>mapred.tasktracker.{map,reduce}.tasks.maximum</em> parameter.  You can start with 1-2 map/reduce tasks per tasktracker and slowly increase the number until you see a degradation in the HBase performance.</div>
<div>Often network and memory problems manifest themselves first in ZK [<a href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A15" target="_blank">http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A15</a>].  ZK is a distributed lock system and is often called a &#8220;canary&#8221; of HBase.</div>
<div>A <em>vmstat</em> or Ganglia tool should be used to monitor memory status on the RS nodes.  Some VM GC information can be gathered via metrics interface accessible via Jetty interface at <em>&lt;hadoop/hbase-web-ui&gt;/metrics</em>, for example <em><a href="http://node:50060/metrics" target="_blank">http://node:50060/metrics</a></em>, if this is properly configured in <em>hadoop-metrics.properties</em>.</div>
<div>One should also keep in mind that even though the system does not get OOM exceptions, the OS and disk I/O performance may be compromised if the system is low on available memory since the system is under GC pressure and less memory is available to OS to buffer I/O (&#8220;memory cached&#8221;) to speed up other operations.</div>
<h3>Disk</h3>
<div>First, Hadoop requires at least two locations for storing it&#8217;s files: <em>mapred.local.dir</em>, where MapReduce stores intermediary files, and <em>dfs.data.dir</em>, where HDFS stores the HDFS data (there are other locations as well, like <em>hadoop.tmp.dir</em>, where Hadoop and components stores its temporary data).  Both of them can cover multiple partitions.  While the two locations can be placed on physically different partitions, Cloudera recommends to configure them across the same set of partitions to maximize disk-level parallelism (this might not be an issue if the number of disk is much larger than the number of cores).</div>
<div>The sizing guide for HDFS is very simple: each file has a default replication factor of 3 and you need to leave approximately 25% of the disk space for intermediate shuffle files.  So you need 4x times the raw size of the data you will store in the HDFS.  However, the files are rarely stored uncompressed and, depending on the file content and the compression algorithm, on average we have seen a compression ratio of up to 10-20 for the text files stored in HDFS.  So the actual raw disk space required is only about 30-50% of the original uncompressed size.  Compression also helps in moving the data between different systems, e.g. Teradata and Hadoop.</div>
<div>HBase stores the regions in HFiles.  However, during the major compaction the data may be doubled for a given region temporarily.  In addition to HFile storage, there is a small overhead due to WALs, which ideally should be a small portion of the total data size.  Cloudera recommends a 30-50% overhead in terms of free space for HFiles.</div>
<div>While you can run Hadoop MapReduce with only 5-10% of the disk space left, the performance will be compromised due to fragmentation.  Disk performance can be up to 77% slower due to fragmentation and other issues compared to the &#8220;empty disk&#8221; [<a href="http://www.eecs.harvard.edu/vino/fs-perf/papers/keith_a_smith_thesis.pdf" target="_blank">http://www.eecs.harvard.edu/vino/fs-perf/papers/keith_a_smith_thesis.pdf</a>].  With a disk more than 80% full you also run the risk of running out of disk space on an individual mount.</div>
<h3>CPU</h3>
<div>Cloudera recommends total 8 or 12 cores per node, and typically one would have the number of cores equal or slightly larger than the number of spindles.  One would like have the total number of mappers and reducers to be total number of hyperthreads &#8211; 2 (2 is for daemons and OS processing) and the ratio of mappers to reducers slightly skewed towards mappers as the reducers tend to spend more time waiting for the mappers.  The importance of CPU power increases with CPU intensive jobs and when using more compute-intensive compression like BZip2.</div>
<div>A typical configuration may be found <a href="http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations" target="_blank">here</a>.</div>
<h3>Summary</h3>
<div><span style="text-decoration: underline"><br /> </span></div>
<table width="80%" border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"> </td>
<td valign="top">Network</td>
<td valign="top">Memory</td>
<td valign="top">Disk</td>
<td valign="top">CPU</td>
<td valign="top"># of nodes</td>
</tr>
<tr>
<td valign="top">HDFS</td>
<td valign="top">1GE TOR, 10GE core</td>
<td valign="top"> </td>
<td valign="top">8-10 spindles/node</td>
<td valign="top"> </td>
<td valign="top">enough nodes to fit the data</td>
</tr>
<tr>
<td valign="top">Hadoop MapReduce</td>
<td valign="top">1GE TOR, 10GE core</td>
<td valign="top">1-2 GB/task</td>
<td valign="top"># of spindles = # of cores</td>
<td valign="top">8-12 cores/node, # of tasks = # of hyperthreads &#8211; 2</td>
<td valign="top"> </td>
</tr>
<tr>
<td valign="top">HBase</td>
<td valign="top">1GE TOR, 10GE core</td>
<td valign="top">at least 4GB/node</td>
<td valign="top"> </td>
<td valign="top">8-12 cores/node, reduce # of tasks if running with Hadoop DN/TT</td>
<td valign="top">enough nodes to fit all regions and serve requests</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Migrating to CDH</title>
		<link>http://www.cloudera.com/blog/2010/08/migrating-to-cdh3/</link>
		<comments>http://www.cloudera.com/blog/2010/08/migrating-to-cdh3/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 01:32:37 +0000</pubDate>
		<dc:creator>Eric Sammer</dc:creator>
				<category><![CDATA[distribution]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=4147</guid>
		<description><![CDATA[With the recent release of CDH3b2, many users are more interested than ever to try out Cloudera&#8217;s Distribution for Hadoop (CDH). One of the questions we often hear is, &#8220;what does it take to migrate?&#8221;. Why Migrate? If you&#8217;re not familiar with CDH3b2, here&#8217;s what you need to know. All versions of CDH provide: RPM [...]]]></description>
			<content:encoded><![CDATA[<p>With the <a href="http://www.cloudera.com/blog/2010/06/cdhv3-and-cloudera-enterprise/">recent release of CDH3b2</a>, many users are more interested than ever to try out Cloudera&#8217;s Distribution for Hadoop (CDH). One of the questions we often hear is, &#8220;what does it take to migrate?&#8221;.</p>
<h2>Why Migrate?</h2>
<p>If you&#8217;re not familiar with CDH3b2, here&#8217;s what you need to know.</p>
<p>All versions of CDH provide:</p>
<ul>
<li>RPM and Debian packages for simple installation and management.</li>
<li>Clean integration with the host operating system. Logs are in <code>/var/log</code>, common binaries in <code>/usr/bin</code>, and configuration in <code>/etc</code>.</li>
<li>A Cloudera support-ready distribution. As Hadoop becomes a mission critical component of your production infrastructure, you&#8217;ll want the option of engaging Cloudera for support or consulting services. Running CDH makes this process simple.</li>
</ul>
<p>CDH3b2 additionally is:</p>
<ul>
<li>A complete platform with smooth integration of popular projects such as Hive, HBase, Pig, Zookeeper, Flume, Sqoop, Oozie, and HUE. HDFS and Hadoop Map Reduce are only two parts of a larger system. CDH3b2 brings together tools frameworks to get data in and out of HDFS, coordinate complex processing pipelines, as well as process and analyze your data. <a href="http://www.cloudera.com/blog/2010/07/more-on-clouderas-distribution-for-hadoop-3/">Learn more</a> about this.</li>
<li>Based on Apache Hadoop 0.20.2 with 320 patches worth of feature back ports, stability enhancements, and bug fixes.</li>
</ul>
<h2>Overview</h2>
<p>The migration process does require a moderate understanding of Linux system administration. You should make a plan before you start. You will be restarting some critical services such as the name node and job tracker, so some downtime is necessary. Given the value of the data on your cluster, you&#8217;ll also want to be careful to take recent back ups of any mission-critical data sets as well as the name node meta-data.</p>
<p>Backing up your data is most important if you&#8217;re upgrading from a version of Hadoop based on an Apache Software Foundation release earlier than 0.20. There were changes in the open source HDFS implementation prior to 0.20 that force this upgrade. See the section below on compatibility for more details.</p>
<p>The process I&#8217;ll outline here is as follows:</p>
<ul>
<li>CDH version selection</li>
<li>Options for installation</li>
<li>Installation process</li>
<li>Migration of configuration data</li>
<li>Testing your cluster</li>
</ul>
<h2>Selecting a Branch</h2>
<p>One of the first questions you should ask yourself is what level of stability versus new features you require from Hadoop. If you&#8217;re managing a production Hadoop cluster with jobs with SLAs, you need a rock solid, production-proven Hadoop distribution. This is Cloudera&#8217;s stable or production branch. At the time of this writing, this is CDH2 based on Hadoop 0.20.1+169.89. In certain cases, features may be of greater priority, in which case, CDH3 0.20.2+320 is appropriate.</p>
<p>It&#8217;s important to note that both CDH2 and CDH3 pass all functional and unit tests at Cloudera. The real difference between them is that CDH2 has been in the field longer. We generally promote a release to stable when we&#8217;ve seen it running production workloads for a substantial period of time, and when the rate of issues opened against the distro in our support group tails off. We have customers running in production today on both CDH2 an CDH3.</p>
<h2>On Compatibility</h2>
<p>Before we dive into the installation process I&#8217;ll highlight some points on compatibility. When upgrading to CDH from an older version or another distribution of Hadoop, it&#8217;s possible that HDFS data needs to be taken through an upgrade process. This is relatively simple, but as with any upgrade of critical data, it is absolutely necessary to back up your data.</p>
<p>Currently, it is not necessary to perform an HDFS upgrade if you&#8217;re upgrading to CDH3 from CDH2 or Apache Hadoop versions 0.20.0 or later. In fact, any distribution of Hadoop based on Apache 0.20.0 is likely to be a clean transition without an update to HDFS required, but you should always check with the distributor.</p>
<p>During RPC operations, all Hadoop daemons will check to ensure they are speaking to the same exact version as themselves. This means that you cannot, at present, perform a rolling upgrade of CDH. There has been some discussion about relaxing this requirement so compatible versions of Hadoop can communicate, but this has not yet been implemented.</p>
<h2>Installation Options</h2>
<p>CDH is available in three forms: RPMs, debs, and tarball distributions. The preferred method of installation is usually the RPM or deb packages as they automate a lot of the work required to get CDH up and running quickly. Tarballs of CDH are useful for users on systems that do not use yum/rpm or apt/dpkg, or where you do not have root access to the host operating system.</p>
<h2>Installing CDH</h2>
<p>When installing CDH from from RPMs or Debian packages you will definitely want to take advantage of Cloudera&#8217;s yum or apt repository support. If you&#8217;re on a system that is not rpm or deb format packages, you can still use Cloudera&#8217;s binary tarball packages.</p>
<p>You should follow the normal process for installing CDH on your systems. The CDH packages should be installed on all nodes in the cluster. The rpm and deb packages of CDH will automatically create a hadoop user and group as well as SYSV init scripts as part of the install process. The CDH tarballs do not contain the init scripts and obviously do not create the hadoop user and group.</p>
<p>Detailed <a href="https://docs.cloudera.com/display/DOC/CDH3+Installation+Guide">installation instructions</a> for all formats of CDH are available.</p>
<p>After the packages are installed, you&#8217;ll want to make sure you set the proper daemons to start on the proper machines upon boot. There is a separate init script for each Hadoop daemon so only what is necessary is started.</p>
<p>Redhat example:<br />
<code>% chkconfig --level 3 hadoop-0.20-namenode on</code></p>
<p>Debian example:<br />
<code>% update-rc.d hadoop-0.20-namenode start 80 3 .</code></p>
<p>Make sure you specify the correct run level. While run level 3 is common for multiuser Linux servers, this may not be the case in your installation. You can use the runlevel command to find the currently active run level.</p>
<p>For now, do not start any of the Hadoop daemons.</p>
<h2>Migrating Your Configuration</h2>
<p>If you&#8217;re coming from older version of CDH, your configuration should already be setup with alternatives. If not, now is a good time to bring your configuration layout in line with CDH by moving your conf directory to <code>/etc/hadoop-0.20/conf.mycluster</code>. You should also configure alternatives to know about your new configuration. The <a href="https://docs.cloudera.com/display/DOC/CDH3+Installation">CDH documentation</a> covers this in detail. For now, register your new configuration with alternatives and set it to be the preferred configuration.</p>
<p><code><br />
% alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.mycluster 100<br />
% alternatives --set hadoop-0.20-conf /etc/hadoop-0.20/conf.mycluster<br />
</code></p>
<p>Users who are on systems that don&#8217;t have alternatives or who are installing CDH from tarballs should simply update the configuration files in <code>$HADOOP_HOME/conf</code>. Normally, <code>$HADOOP_HOME</code> is <code>/usr/local/hadoop-$VERSION</code> or <code>/opt/hadoop-$VERSION</code> but you can put it wherever it makes sense. This includes running CDH from your home directory if you don&#8217;t have root access.</p>
<h2>Testing CDH</h2>
<p>Now that CDH is installed and you&#8217;ve migrated your cluster configuration it&#8217;s time to fire up a few nodes and make sure everything is working as expected. Rather than bring up all the daemons at once, let&#8217;s focus on the name node first.</p>
<p>Start by logging on to the name node machine. You may want to manually rotate the log file just to minimize the noise during testing. You can do this by simply moving today&#8217;s log file to a different name.</p>
<p><code>% mv /var/log/hadoop/hadoop-hadoop-namenode-nn.mycompany.com.log \<br />
/var/log/hadoop/hadoop-hadoop-namenode-nn.mycompany.com.log.old</code></p>
<p>Next, start the CDH name node daemon using the provided init script. If an HDFS upgrade is required, you can use the <code>upgrade</code> argument in place of <code>start</code> below. This will be your last chance to grab a backup of the name node&#8217;s metadata prior to starting the daemon.</p>
<p><code>% /etc/init.d/hadoop-0.20-namenode start</code></p>
<p>Note that the CDH init scripts require you to be root whereas the Apache Hadoop start-all.sh / stop-all.sh scripts should <em>not</em> be run as root.</p>
<p>It&#8217;s a good idea to check the contents of the name node log file now to ensure it has come up cleanly. You should see a warning about the name node being in safe mode due to missing blocks. This is OK because we haven&#8217;t brought up any data nodes yet. If something doesn&#8217;t look right, jump ahead to the getting help section before proceeding.</p>
<p>Before you start any of your data nodes, you&#8217;ll want to place the name node in safe mode manually. This will prevent the name node from &#8220;panicking&#8221; and trying to repair missing block replicas as data nodes begin to register themselves. You&#8217;ll need to run this command as the hadoop user.</p>
<p><code>% hadoop dfsadmin -safemode enter</code></p>
<p>Next start one of the data nodes and watch its logs as you did for the name node.</p>
<p><code>% /etc/init.d/hadoop-0.20-datanode start</code></p>
<p>If everything is setup correctly, you should see the data node start up, register with the name node, and start its periodic block scanner thread. You should also check the name node logs to confirm you see the data node registration message there as well. Once you&#8217;ve confirmed that things look good, you should move on to starting additional data nodes checking them in batches as you go.</p>
<p>After all data nodes are up and running, you can use the Hadoop fsck tool to confirm that the file system is healthy.</p>
<p><code>% hadoop fsck /</code></p>
<p>Your cluster should still be in safe mode. If the file system is healthy, you can go ahead and take it out of safe mode.</p>
<p><code>% hadoop dfsadmin -safemode leave</code></p>
<p>Follow this with a quick test of HDFS by copying a file into the file system.</p>
<p><code>% date > now.txt<br />
% hadoop fs -put now.txt /now.txt<br />
% hadoop fs -cat /now.txt<br />
% hadoop fs -rm /now.txt<br />
% rm now.txt</code></p>
<p>Congratulations! You now have HDFS running on CDH.</p>
<p>If you had to upgrade the HDFS data &#8211; that is, you started the init script with the <code>upgrade</code> option &#8211; you should do some more extensive testing of your data. Once you&#8217;ve confirmed everything is working as expected, finalize the HDFS upgrade.</p>
<p><code>% hadoop namenode -finalize</code></p>
<p>Starting and testing the map reduce daemons follows a similar procedure but is a bit simpler. Start the job tracker daemon on the proper machine and monitor the logs as you did with the name node. Once you&#8217;ve confirmed the job tracker is running, proceed with starting the task tracker daemons in groups checking the job tracker UI as you go. You should see the map and reduce task capacity increasing with each node you start. Don&#8217;t panic if the job tracker doesn&#8217;t see the nodes immediately; it can take a few seconds.</p>
<p>Don&#8217;t forget to start the secondary name node daemon as well. It&#8217;s usually a good idea to wait an hour or so and check the modification time on the files in the configured fs.checkpoint.dir. You should see that the files have been updated within the last hour. You can also check the secondary name node logs; you&#8217;ll see an indication things are working there as well in the form of some log messages about performing the checkpoint.</p>
<h2>Documentation and References</h2>
<p>In addition to the community articles and blog posts on Hadoop, Cloudera provides CDH-specific documentation at <a href="http://docs.cloudera.com">docs.cloudera.com</a>. Here you can find information on CDH including all of its components like Hadoop, Hive, Flume, Sqoop, HUE, and others.</p>
<h2>How to Get Help</h2>
<p>There are a number of ways to get help if you run into trouble during your migration or if you just have questions.</p>
<ul>
<li><a href="http://docs.cloudera.com">Cloudera Documentation</a></li>
<li><a href="http://groups.google.com/a/cloudera.org/groups/dir">Cloudera mailing lists</a></li>
<li><a href="http://www.cloudera.com/resources/?media=Video">Cloudera videos</a></li>
<li>IRC users can join #cloudera on <a href="http://freenode.net">freenode</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2010/08/migrating-to-cdh3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s New in CDH3b2: ZooKeeper</title>
		<link>http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-zookeeper/</link>
		<comments>http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-zookeeper/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 21:54:42 +0000</pubDate>
		<dc:creator>Patrick Hunt</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/?p=3811</guid>
		<description><![CDATA[CDH3 beta 2 is the first to incorporate <a href="http://hadoop.apache.org/zookeeper/docs/current/">Apache ZooKeeper</a>. ZooKeeper is a highly reliable and available coordination service for distributed processes. It is a proven technology and a well established open source project at Apache (sub-project of Hadoop).]]></description>
			<content:encoded><![CDATA[<p><span style="font-weight: normal;font-size: 13.3333px"><a title="CDH3 installation documentation" href="https://wiki.cloudera.com/display/DOC/Hadoop+Installation+Documentation+for+CDH3" target="_blank">CDH3 beta 2</a> is the first version of CDH to incorporate <a href="http://hadoop.apache.org/zookeeper/docs/current/"><span>Apache <span>ZooKeeper</span></span></a><span>. <span>ZooKeeper</span> is a highly reliable and available coordination service for distributed processes. It is a proven technology and a well established open source project at Apache (sub-project of <span>Hadoop</span>).</span></span></p>
<h2><span><span>ZooKeeper</span> is distributed coordination</span></h2>
<p><span>Often distributed applications need some way to coordinate across processes; locking resources, managing queues of events, electing a &#8220;leader&#8221; process, configuration, etc&#8230; Coordination operations such as these are notoriously hard to get right. <span>ZooKeeper</span> provides a relatively simple API which allows clients to correctly implement these and many other coordination mechanisms.</span></p>
<p><span>?ZooKeeper is itself a replicated service based on a quorum algorithm. One or more ZooKeeper servers form what&#8217;s called an &#8220;ensemble&#8221;, which are in constant communication. As the size of the ensemble increases the reliability of the service itself increases &#8211; as long as a majority of the configured ensemble servers are available the service is available. As an example, say you have an ensemble of size three (three ZooKeeper servers), if one of the three fail the service is still &#8220;up&#8221;. If two of the three fail the service is down. One could run with five servers, in which case if two servers fail the service as a whole would still be available. Seven server ensembles can survive three failures, and so on.</span></p>
<p><span style="font-size: 13.3333px">Who should be interested in the <span>ZooKeeper</span> project? (I say &#8220;project/us&#8221; here because the team &amp; community are just as important as the software, if not more so) Well, developers appreciate us because we make it simple to implement some very difficult distributed communication problems. Operations teams like us because we ensure that they only need to learn, operate and maintain a single, sane, coordination mechanism that&#8217;s easy to manage. Business folks like the fact that we are a proven technology helping to ensure high availability, allowing the development/ops teams to focus on domain specific problems.</span></p>
<p><span>For more detail on <span>ZooKeeper</span> see the </span><a title="ZooKeeper Overview page" href="http://archive.cloudera.com/cdh/3/zookeeper/zookeeperOver.html" target="_blank">overview page</a>. Additional documentation relative to CDH releases is available&#160;<a title="CDH documentation archive" href="http://archive.cloudera.com/cdh/3/zookeeper/" target="_blank">here</a>.</p>
<h2><span>Powered By <span>ZooKeeper</span></span></h2>
<p><span>Yahoo!, <span>Facebook</span>, Twitter, <span>Digg</span>, <span>Rackspace</span> and a number of other companies are making use of <span>ZooKeeper</span> in their production environments today. Additionally technologies such as <span>HBase</span>, <span>Solr</span>, <span>Katta</span>, and Neo4j have all increased reliability/availability and extended their capabilities by adopting <span>ZooKeeper</span>.</span></p>
<p><span>Here at <span>Cloudera</span> we </span><a href="http://www.cloudera.com/blog/2010/06/cdhv3-and-cloudera-enterprise/">recently open sourced</a><span> Flume, a distributed, real-time event collection service which is also part of CDH3 beta 2. Flume makes use of <span>ZooKeeper</span> to coordinate its various distributed components &#8211; for example to store and manage dynamically updated configuration information. You can find out more about Flume </span><a href="http://github.com/cloudera/flume">here</a>.</p>
<h2><span><span>ZooKeeper</span> and CDH</span></h2>
<p><span>A significant amount of work has gone into <span>ZooKeeper</span> integration with CDH3. In particular the <span>Cloudera</span> team ensures that the CDH3 components relying on <span>ZooKeeper</span> (</span><a title="CDH3 and HBase" href="http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-hbase/" target="_blank"><span><span>HBase</span></span></a> and <a title="Flume - reliable event collection" href="https://wiki.cloudera.com/display/DOC/Flume+Installation" target="_blank">Flume</a>) are fully compatible and will work together. <a title="ZooKeeper CDH installation documentation" href="https://wiki.cloudera.com/display/DOC/ZooKeeper+Installation" target="_blank"><span>CDH based <span>ZooKeeper</span> packages</span></a><span> (tar, RPM, and DEB files)&#160;containing libraries as well as <span>startup</span> scripts for running as a service are available on the <span>Cloudera</span> website.</span></p>
<p><span><span>Cloudera</span> is an active member of the <span>Hadoop</span> and <span>ZooKeeper</span> communities &#8211; we have two active <span>commiters</span> working on <span>ZooKeeper</span>, myself and Henry Robinson. </span><a title="Henry Robinson" href="http://www.cloudera.com/blog/author/henry/" target="_blank">Henry</a> recently contributed <a title="Henry's Blog Post on Observers" href="http://www.cloudera.com/blog/2009/12/observers-making-zookeeper-scale-even-further/" target="_blank">a major new feature called &#8220;observers&#8221;</a><span> which greatly extends <span>ZooKeeper&#8217;s</span> read scalability, Henry also created and maintains the popular <span>ZooKeeper</span> </span><a title="zkpython" href="http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/" target="_blank">python client binding</a>.</p>
<p><span><span>In addition to leading the <span>ZooKeeper</span> project at Apache I&#8217;ve personally been working on a number of projects for upcoming releases; I&#8217;m currently adding transport level security (encryption and authentication of communications) to the service. The team is constantly on the lookout for other areas where the technology may be applied. As I mentioned <span>HBase</span> and Flume are currently using <span>ZooKeeper</span> and we hope to extend this further, in particular around the idea of centralized configuration and monitoring for <span>Hadoop</span>. There are just too many configuration files floating around when setting up a <span>Hadoop</span> based service, it seems like <span>ZooKeeper</span> would be a perfect fit for this. A great example of this in use today is <span>LinkedIn&#8217;s</span> </span><a title="LinkedIn's Norbert" href="http://github.com/rhavyn/norbert" target="_blank">Norbert</a><span> project, where <span>ZooKeeper</span> is used to maintain and manage cluster&#160;<span>metadata</span>.</span></span></p>
<h2><span>Join the <span>ZooKeeper</span> community</span></h2>
<p><span>Find out more about <span>ZooKeeper</span> on the official Apache </span><a title="Apache ZooKeeper community" href="http://hadoop.apache.org/zookeeper/docs/current/" target="_blank">project pages</a>. On that page you&#8217;ll also find links for <a title="Apache ZooKeeper Documentation" href="http://hadoop.apache.org/zookeeper/docs/current/" target="_blank">documentation</a>, user and developer <a title="ZooKeeper Mailing Lists" href="http://hadoop.apache.org/zookeeper/mailing_lists.html" target="_blank">mailing lists</a>, <a title="ZooKeeper JIRA" href="https://issues.apache.org/jira/browse/ZOOKEEPER" target="_blank">issue tracking</a>, etc&#8230; we welcome new users and contributors. You might also <a title="@phunt" href="http://twitter.com/phunt" target="_self">follow me on twitter</a>, where I frequently post on community related issues.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-zookeeper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a distributed concurrent queue with Apache ZooKeeper</title>
		<link>http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/</link>
		<comments>http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/#comments</comments>
		<pubDate>Thu, 28 May 2009 22:30:15 +0000</pubDate>
		<dc:creator>Henry Robinson</dc:creator>
				<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.cloudera.com/blog/?p=695</guid>
		<description><![CDATA[In my first few weeks here at Cloudera, I&#8217;ve been tasked with helping out with the Apache ZooKeeper system, part of the umbrella Hadoop project. ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having [...]]]></description>
			<content:encoded><![CDATA[<p>In my first few weeks here at <a href="http://www.cloudera.com">Cloudera</a>, I&#8217;ve been tasked with helping out with the <a href="http://hadoop.apache.org/zookeeper/">Apache ZooKeeper</a> system, part of the umbrella <a href="http://hadoop.apache.org">Hadoop project</a>. ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they&#8217;ve all reached the same point in their execution &#8211; a kind of distributed <a href="http://en.wikipedia.org/wiki/Barrier_(computer_science)">barrier</a> &#8211; is surprisingly difficult to do correctly. ZooKeeper offers an API to facilitate this sort of distributed coordination. For example, it is often used to serve locks to client processes &#8211; locks are just another kind of coordination primitive &#8211; in the form of small files that ZooKeeper tracks.</p>
<p>In order to be useful, ZooKeeper must be both highly reliable and available as systems will rely upon it as a critical component. For example, if locks cannot be taken, processes cannot make progress and the whole system will grind to a halt. ZooKeeper is built on a suite of reliable distributed systems techniques and protocols, and is typically run on a cluster of machines so that if some should fail, the remaining ones can continue to provide service. Under the hood, ZooKeeper is responsible for ordering calls made by clients so that each request is processed atomically and in a fixed and firm order.</p>
<p>One of my first contributions to the project was a set of bindings to allow programs written in <a href="http://www.python.org">the Python language</a> to act as clients to a ZooKeeper cluster. ZooKeeper was natively written in Java, and there are already C and Perl bindings. Adding Python bindings increases the number of people that can use the system, and brings the strengths of Python, such as rapid prototyping, to bear when designing distributed systems.</p>
<p>The Python ZooKeeper bindings are available from the ZooKeeper SVN repository and should be part of the 3.2 release, planned for the next couple of weeks. To use the bindings now, you can either check out the latest version of the code from the <a href="http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute">SVN repository</a>, or download a tarball containing a recent snapshot <a href="http://github.com/apache/zookeeper/tarball/cb4f93dad4ef1d616087606fa60cd9fdfdb1d741">here</a>. The <tt>zookeeper</tt> module exposes the ZooKeeper API to Python, so to get started all you need do is add <tt>import zookeeper</tt> to your Python script once the module is installed. Instructions on getting up and running are at the end of this post.</p>
<p>To illustrate some of the ZooKeeper API, I&#8217;ve written a distributed FIFO queue in Python &#8211; the source code is <a href="http://github.com/henryr/pyzk-recipes/blob/db8a3bf89648d8c1740351c1d43cdf1efb7e2a4c/queue.py">here</a> &#8211; which I wanted to share. The combination of Python and Zookeeper meant that I was able to write the queue in just over 60 lines of code, and most of that deals with <em>local</em> coordination issues between two threads rather than any tricky issues trying to make remote processes behave correctly. I can only give a taste here of how programming with Python and ZooKeeper works. I hope there&#8217;s enough here to convince you that ZooKeeper might make a useful component for distributed systems that need a little herding.<br />
<span id="more-695"></span></p>
<h2>ZooKeeper</h2>
<p>ZooKeeper provides a tree abstraction where every node in that tree (or <em>znode</em>, in ZooKeeper parlance) is a file on which a variety of simple operations can be performed. ZooKeeper orders operations on znodes so that they occur atomically. Therefore there is no need to use complex locking protocols to ensure that only one process can access a znode at a time. The tree represents a hierarchical namespace, so that many distinct distributed systems can use a single ZooKeeper instance without worrying about their files having the same name.</p>
<p>Each znode has some associated data &#8211; up to a megabyte in current builds &#8211; that can be updated atomically. Every update to a znode increases its version number, which allows clients to perform compare-and-swap operations by reading the version and then updating a znode only if the version is still the one that was read.</p>
<p>As a notification mechanism, ZooKeeper provides watches, which are callback methods that are called asynchronously when an event of interest occurs. Watches are attached, typically, to an individual znode. When that znode changes any watcher on the znode will be fired asynchronously on the client. Many methods of the ZooKeeper API have an optional watch argument. Some languages have to work hard to provide callable objects as parameters, but Python makes this easy as callables are first class language constructs. Simply pass any callable, like a method or a lambda expression, to the zookeeper module and when an event of interest occurs, the callable will be executed.</p>
<p>This call comes from a separate thread of execution, so great care must be taken to ensure that unexpected things do not happen due to your watcher being fired at an arbitrary point in the execution of your script. Normally you will use watchers to notify another thread of a state change. It will often be the case that the main thread will be waiting for the watcher to fire before it can continue. An example of this is in the <tt>__init__</tt> method of our <tt>ZooKeeperQueue</tt> when we try to connect to the server. Compared to the time a script takes to execute, connections can take a long time to run. So it&#8217;s useful that the ZooKeeper API allows us to connect asynchronously, in case there were any work that we wanted to get done while we were waiting for the connection to be established. However, in our case, we just want to wait until the connection is successful, and so we need a mechanism to wait for the watcher to notify us.</p>
<p>A useful tool for this inter-thread communication is the <tt>Condition</tt> object in Python, which represents a condition variable, a well-known concurrent programming abstraction. <tt>Condition</tt> objects may be acquired and released just like locks, but they also expose an API to wait for a notification from another thread and to fire that notification. While a thread is waiting on a <tt>Condition</tt> it goes to sleep, leaving the operating system with some free CPU to dedicate to other processes. Once a <tt>Condition</tt> is notified, a thread that is waiting on it is woken up and allowed to continue execution once the notifying thread has released the <tt>Condition</tt>.</p>
<p>This leads to a simple pattern for communicating between watchers and the main thread. Here&#8217;s an excerpt from the connection code:</p>
<pre>def watcher(handle,type,state,path):
    print "Connected"
    self.cv.acquire()
    self.connected = True
    self.cv.notify()
    self.cv.release()

self.cv.acquire()
self.handle = zookeeper.init("localhost:2181", watcher, 10000, 0)
self.cv.wait(10.0)</pre>
<p>First we define our watcher which takes four parameters (if you want to provide more parameters or local state to a watcher, one way to do it is to wrap a function call in a local lambda which captures the state). The next line acquires an exclusive lock on a condition variable <tt>cv</tt>. Why do this now? Once we set our watcher in place, it could be fired at any time &#8211; even before the main thread makes progress to the next line of code. If we don&#8217;t prevent it from sending a notification on the condition variable before we&#8217;re ready to look for it, the notification could get lost and we could wait forever. Notifications aren&#8217;t buffered &#8211; if no one is waiting on a condition variable, no one gets woken up.</p>
<p>Then the code initialises ZooKeeper. The <tt>zookeeper</tt> module gives us an integer handle which we can use to refer to our connection in the future (we can open many connections per client). The next line tells us to wait until we receive a notification on the condition variable that the connection has succeeded. The parameter is a timeout in seconds, after which if we are still not connected we presume that something is wrong and abort.</p>
<h2>The ZooKeeper queue</h2>
<p>A FIFO queue is a simple data structure where producers put items in, and consumers retrieve them in the order they were put in. There are only two operations on a basic queue: <tt>enqueue</tt> adds an item and <tt>dequeue</tt> removes it. Despite their simplicity, queues crop up very often in distributed systems &#8211; for example, in job submission systems where clients submit requests to a set of workers which serve the requests on a first-come, first-served basis.</p>
<p>The ZooKeeper queue is structured very simply. All items are stored as znodes under a single top-level znode which represents a queue instance. Consumers can retrieve items by getting and then deleting a child of the top-level znode. The code creates a queue by calling a single create command. If the queue already exists, the Python module will throw an exception which we catch. This is a design decision that is still in review &#8211; future versions of the bindings might return integer error codes, and rely on the user to throw an exception if required.</p>
<pre>zookeeper.create(self.handle,self.queuename,"queue top level", [ZOO_OPEN_ACL_UNSAFE],0)</pre>
<p>The first two arguments to this call identify the connection to the ZooKeeper service and the name of the znode. The third is the data the znode contains. We won&#8217;t be accessing the data so we write some placeholder text.</p>
<p>The fourth argument is an access control list of permissions that controls who can access the znode in the future. ZooKeeper provides fairly fine-grained control over access, but the subject is beyond the scope of this post. What we have done here is to create the queue znode so that any client can read or write to it.</p>
<h2>Adding and deleting items from the queue</h2>
<p>Although I explained how consumers retrieve items from the queue, I said nothing about how they make sure they are retrieving items in FIFO order. What we would like is a way of naming each item such that later items are ordered lexicographically after earlier ones. If we can retrieve items in the same order, we&#8217;ll have our queue. Thankfully, ZooKeeper provides a very handy flag for the create call that helps us out. Specifying the <tt>zookeeper.CREATE_SEQUENCE</tt> flag appends each znode name with an sequence number suffix that increases monotonically with each new znode that is created. ZooKeeper ensures that the sequence numbers are applied in order and are not reused.</p>
<p>Enqueuing an item is therefore a simple one liner. We don&#8217;t have to take out any locks to ensure that access to the queue znode is serialised. Items may be queued concurrently, and ZooKeeper takes care of assigning sequence numbers to them in the order they were received.</p>
<p>Dequeuing an item is also straightforward, but a bit more involved. First we retrieve a list of all the items waiting to be queued from ZooKeeper with the <tt>get_children</tt> procedure call. Then, after sorting the list of items on the client, we get the contents of the znode (<em>i.e.</em> the item&#8217;s data) and then try to delete it.</p>
<p>It is possible that this deletion will fail because some other consumer has managed to successfully retrieve the item beforehand. We could ensure that this would never happen by organising for a queue-wide lock &#8211; this is easily implemented in ZooKeeper (although left as an exercise for the reader). However, this would severely impact performance by only allowing a single consumer to access the queue at one time. Instead, the client simply deals with the failed delete &#8211; again, indicated via an exception &#8211; and moves on to the next child znode in the list. If the client reaches the end of the list without successfully deleting an item, it should issue another <tt>get_children</tt> call to make sure that no items were added while the original list was being scanned. Once the <tt>get_children</tt> call returns an empty list, the <tt>dequeue</tt> procedure gives up and returns None.</p>
<h2>Blocking reads</h2>
<p>Sometimes we might want to block until an item is available to retrieve. It would be inefficient to copy exactly the non-blocking approach and simply loop, issuing <tt>get_children</tt> requests until an item was found. Instead, we can leverage ZooKeeper&#8217;s watcher mechanism to provide an asynchronous notification when a new znode is created as a child of the queue znode. The code to accomplish this is a combination of the patterns we&#8217;ve seen already in the <tt>dequeue</tt> and connection code.</p>
<pre>def block_dequeue(self):
    def queue_watcher(handle,event,state,path):
        self.cv.acquire()
        self.cv.notify()
        self.cv.release()
    while True:
        self.cv.acquire()
        children = sorted(zookeeper.get_children(self.handle, self.queuename, queue_watcher))
        for child in children:
            data = self.get_and_delete(self.queuename+"/"+children[0])
            if data != None:
                self.cv.release()
                return data
        self.cv.wait()
        self.cv.release()</pre>
<p>First the client acquires a lock to prevent the watcher sending a notification when the client is unready. Then, as in the dequeue method, the client retrieves a list of items, but here a watcher parameter is specified. The watcher will fire whenever any event is seen that is relevant to the queue znode. The watcher acquires the lock &#8211; blocking until the client has given it up &#8211; and then notifies the client that there may be more items available.</p>
<p>The client only waits for this notification if all the children returned from get_children have already been consumed by others &#8211; otherwise it will successfully retrieve an item and return it. Once all possible items have been exhausted, the client waits on the condition variable. After being woken up, it repeats the same list-read-delete-wait loop.</p>
<h2>Failure modes</h2>
<p>ZooKeeper operations can fail in a number of ways. In order to keep this example simple, most errors are raised as exceptions and the queue aborts. A more robust implementation should catch errors at every ZooKeeper invocation, as many can be recovered from with a little effort.</p>
<p>The <tt>zookeeper.CONNECTIONLOSS</tt> error condition is particularly worth noting. ZooKeeper may drop a client connection at any time, due to physical link loss, network congestion or other connection problem. This can cause ZooKeeper API invocations to abort before the ZooKeeper cluster is able to inform the client of the operation&#8217;s success. This is problematic for our queue, as <tt>enqueue</tt> operations may or may not have succeeded when we receive a <tt>CONNECTIONLOSS</tt> error.</p>
<p>There are several approaches we can take to this problem. The first is to blindly retry <tt>enqueue</tt> when a connection is lost. This could result in an item being queued several times, but for some systems this is not a significant problem. For example, if a web page is crawled twice, apart from the time cost there will be no hardship caused to a indexing engine.</p>
<p>For some applications, duplication of <tt>enqueue</tt> operations is problematic. The obvious &#8216;solution&#8217; is to check whether an item is in the queue after it has been queued. However, it is possible that a consumer will have retrieved and deleted the item between the connection loss event and the subsequent reconnection and existence check. Instead, a two-phase protocol is necessary where a producer marks an item as &#8216;consumable&#8217; only when it is sure it is in the queue, by atomically updating its associated data with a flag. Consumers may only retrieve items for which the flag is set. If a connection loss occurs during the setting of this flag, recovery is easier as the <tt>set</tt> call may be reissued &#8211; if the item is no longer present in the queue, the only possible explanation is that the original flag update succeeded and the item has been consumed. This is not built into the example code, but a production system should implement a similar form of connection loss recovery.</p>
<p>Taking care of failure modes like this one often comprises most of the work of building a distributed system. The key is to understand every exception that API calls can throw, and to know what your code does in every circumstance.</p>
<h2>Using the queue</h2>
<p>To use the queue, you must first make sure you have built and installed both the C client libraries and the <tt>zookeeper</tt> Python module. There are two prerequisite packages: the <tt>cppunit</tt> development package and the Python development package. On <tt>yum</tt>-based systems, these are named <tt>cppunit-devel</tt> and <tt>python-devel</tt>. Both packages are available through standard platform package managers like yum, apt and Darwin ports.</p>
<p>As a prerequisite to building the C client libraries, the Java based-server must be built. This auto-generates some header files that the C libraries rely on. From the root directory of the downloaded distribution:<br />
<code><br />
ant<br />
</code></p>
<p>The C client libraries for ZooKeeper must be installed as the Python module makes use of them to actually communicate with a ZooKeeper cluster. It&#8217;s easiest to build these from source. From the <tt>src/c</tt> directory, type the following:<br />
<code><br />
autoreconf -if<br />
./configure<br />
make &amp;&amp; sudo make install<br />
</code><br />
The downloadable package contains the source code for the Python module. To build and install, one command should do the trick from the <tt>src/contrib/zkpython</tt> directory:<br />
<code><br />
ant install<br />
</code><br />
To test the installation, start a Python shell and type <tt>import zookeeper</tt>. If you don&#8217;t see any errors or warnings, the module has been built and installed successfully. The bindings have been tested with Python 2.3, 2.4, 2.5 and 2.6, and are known not to work with 2.2 and earlier. We haven&#8217;t yet tested them against Python 3.x &#8211; we&#8217;d love to hear your feedback about your experiences with the latest versions of Python.</p>
<p>To run the queue example, you must have a ZooKeeper server running on the local machine at port 2181 (to change the location of the server, edit the string passed to <tt>zookeeper.__init__</tt>). The Java-based server will have been built when you ran <tt>ant</tt> from the root directory of the distribution earlier. Before the server can run, it needs a configuration file to read:<br />
<code><br />
cat &gt;&gt; conf/zoo.cfg<br />
tickTime=2000<br />
dataDir=/tmp/zookeeper<br />
clientPort=2181<br />
</code></p>
<p>Now you can run <tt>bin/zkServer.sh start</tt> to start a standalone server on the local machine. To stop the server in the future, run <tt>bin/zkServer.sh stop</tt>.</p>
<p>You&#8217;re finally ready to run the queue example:<br />
<code><br />
python queue.py<br />
</code></p>
<p>The example is very simple. It queues three items, and then dequeues them.</p>
<h2>Wrapping up</h2>
<p>I hope that I&#8217;ve shown you that ZooKeeper is a very useful system, with powerful primitives that makes writing tricky distributed concurrent programs easier. There are many applications that ZooKeeper could help you build &#8211; lock servers, name services, metadata stores and even a unique kind of filesystem can be built in a straightforward way using the ZooKeeper API. The project is active and always looking for volunteers. ZooKeeper integration is already being built into HBase, and there are moves to bring greater reliability to Hadoop and HDFS by delegating some server functionality to ZooKeeper. As far as the Python bindings go, the next version will include better documentation, some more Python niceties such as default parameters and docstrings, and a more Pythonic wrapper object to wrap up some of the bookkeeping that ZooKeeper requires.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

