We’ve been talking to enterprise users of Hadoop about existing and new projects, and lots of them are asking questions about reliability and data integrity. So we wrote up a short paper entitled HDFS Reliability to summarize the state of the art and provide advice. We’d like to get your feedback, too, so please leave a comment.
It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.
Organization
At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added, the first of which was HBase, previously a contrib project. ZooKeeper, a service for coordinating distributed systems, and which had been hosted at SourceForge, became a Hadoop sub-project in May. Then in October, Pig (a platform for analyzing large datasets) graduated from the Apache Incubator to become another Hadoop sub-project. Finally, Hive, which provides data warehousing for Hadoop, moved from being a Hadoop Core contrib project to its own sub-project in November. (more…)
Hadoop was created by