Developer Center
Cloudera Blog

HDFS Reliability

We’ve been talking to enterprise users of Hadoop about existing and new projects, and lots of them are asking questions about reliability and data integrity.  So we wrote up a short paper entitled HDFS Reliability to summarize the state of the art and provide advice.  We’d like to get your feedback, too, so please leave a comment.

State of the Elephant 2008

It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.

Organization

At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added, the first of which was HBase, previously a contrib project. ZooKeeper, a service for coordinating distributed systems, and which had been hosted at SourceForge, became a Hadoop sub-project in May. Then in October, Pig (a platform for analyzing large datasets) graduated from the Apache Incubator to become another Hadoop sub-project. Finally, Hive, which provides data warehousing for Hadoop, moved from being a Hadoop Core contrib project to its own sub-project in November. (more…)