OCTOBER 23 – 25, 2012THANK YOUR FOR JOINING US!Strata Conference + Hadoop World 2012 went off without a hitch! The sold-out conference attracted attendees from a wide range of industries and 38 countries. Strata Conference explored the change brought to technology and business by big data, data science, and pervasive computing. The joined forces this year with Hadoop World (in its 4th year), this conference is at the heart of the big data industry. Strata Conference + Hadoop World brought together decision makers using the power of big data to drive business strategy and practitioners who collect, analyze, and manipulate the data — particularly in the worlds of finance, media, and government. The merger of Strata and Hadoop World was the largest gathering of the Apache Hadoop community, with emphasis on hands-on and business sessions on the Hadoop ecosystem. |
WHAT HAPPENED |
|
|
|
CLOUDERA KEYNOTE PRESENTATIONSKEYNOTE: MIKE OLSON![]() Michael Olson – Cloudera CEO Big Questions |
|
KEYNOTE: DOUG CUTTING![]() Doug Cutting – Cloudera Architect, Hadoop Co-founder & Apache Software Foundation Chairman Beyond Batch |
|
|
|
|
CLOUDERA PRESENTATIONSGIVEN ENOUGH MONKEYS – SOME THOUGHTS ON RANDOMNESSJesse Anderson – Cloudera Developer and Instructor Can a million monkeys on a million typewriters eventually recreate Shakespeare? The great minds since Aristotle have been thinking about this theorem. In 2011, Jesse Anderson randomly recreated Shakespeare using Hadoop. Here’s why you should care. » |
|
LARGE SCALE ETL WITH HADOOPEric Sammer – Cloudera Sr. Solutions Architect Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments. » |
|
HDFS – WHAT IS NEW AND FUTURESanjay Radia – Hortonworks Founder Hadoop 1.0 is a significant milestone in being the most stable and robust Hadoop release tested in production against a variety of applications. It offers improved performance, support for HBase, disk-fail-in-place, Webhdfs, etc over previous releases. The next major release, Hadoop 2.0 offers several significant HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, further performance improvements, etc. We describe how to take advantages of the new features and their benefits. We also discuss some of the misconceptions and myths about HDFS. » |
|
HIGH AVAILABILITY FOR THE HDFS NAMENODE: PHASE 2Aaron Myers – Cloudera Software Engineer This session will discuss the design and implementation of features for a highly available namenode, as well as give an overview of how to deploy these new features. » |
|
APACHE HBASE FEATURES FOR THE ENTERPRISEJonathan Hsieh – Cloudera Software Engineer Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages. » |
|
DATA SCIENCE ON HADOOP: HOW CLOUDERA IMPALA UNLOCKS NEW PRODUCTIVITY AND INSIGHTSJustin Erickson – Cloudera Sr. Products Manager This talk will cover what tools and techniques work and don’t work well for data scientists working on Hadoop today and how to leverage the lessons learned by the experts to increase your productivity as well as what to expect for the future of data science on Hadoop. We will leverage insights derived from the top data scientists working on big data systems at Cloudera as well as experiences from running big data systems at Facebook, Google, and Yahoo. » |
|
DESIGNING SCALABLE NETWORK ARCHITECTURES FOR FAST MOVING BIG DATAKenneth Duda – Arista Networks Founder, CTO and SVP, Software Engineering The growth of big data and the continuing dramatic decline in the cost of storage and computer processing are having a profound impact on the way business is being conducted across sectors. An updated network infrastructure is essential to keeping the data flowing smoothly throughout your organization and enabling timely, precise analytics that enhance business decision making. Arista and Cloudera have partnered to create networking architectures that accelerate big data productivity by increasing performance, simplifying network scale-out, and tying into Hadoop’s topology aware storage architecture. » |
|
KNITTING BOARJosh Patterson – Cloudera Sr. Solutions Architect In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Woven Wabbit works and examine the lessons learned from YARN application construction. » |
|
TAMING THE ELEPHANT – LEARN HOW MONSANTO MANAGES THEIR HADOOP CLUSTER TO ENABLE GENOME/SEQUENCE PROCESSINGBala Venkatrao – Cloudera Director, Products Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager. » |
|
|
|
|
CLOUDERA TUTORIALSUSING HBASEAmandeep Khurana – Cloudera Solutions Architect Software testing is hard enough, but it becomes especially challenging when you’re doing large-scale, distributed data processing. This tutorial will present a mix of lecture and instructor-led demonstrations to explain how you can verify that your code performs exactly as you intended. This session will focus on unit testing, integration testing, Performance testing and diagnostics. » |
|
AN INTRODUCTION TO HADOOPMark Fei – Cloudera Instructor This tutorial provides an introduction to Apache Hadoop and what it’s being used for. This will include:
|
|
TESTING HADOOP APPLICATIONSTom Wheeler – Cloudera Curriculum Developer Software testing is hard enough, but it becomes especially challenging when you’re doing large-scale, distributed data processing. This tutorial will present a mix of lecture and instructor-led demonstrations to explain how you can verify that your code performs exactly as you intended. This session will focus on unit testing, integration testing, Performance testing and diagnostics. » |
|
BUILDING A LARGE-SCALE DATA COLLECTION SYSTEM USING FLUME NGHari Shreedharan – Cloudera Software Engineer Hadoop HDFS is typically adopted in situations where traditional storage and database systems are either reaching their limits or have already surpassed them. This usually implies that there are one or more large streams of events that need to be collected, such as log data streams. Flume NG was designed from the ground-up to tackle this problem in a straightforward, scalable, reliable way, and empirical results support the success of its approach. » |
|
|
|
|
MEET THE AUTHORS, FREE BOOKSWe wrote the book… well actually, Cloudera experts in Apache Hadoop and Apache HBase have written the definitive guides. Stop by the Cloudera booth and you’ll have a chance to meet the authors and get a free book. We are giving away 100 of each book on both Wednesday October 24 and Thursday October 25, including the Hadoop Operations and HBase in Action which will be hot off the press. Wednesday, Oct 24 @ 10:20AM Tom WhiteHadoop: The Definitive Guide, 3rd Edition Wednesday, Oct 24 @ 3:10PM Lars GeorgeHBase: The Definitive Guide Wednesday, Oct 24 @ 3:10PM Amandeep Khurana & Nick DimidukHBase in Action Wednesday, Oct 24 @ 5:40PM Eric Sammer |
|
|
|
|
MEETUPSNew York Hadoop User GroupTuesday, Oct 23
Hosts: Eli Collins and Aaron Myers |
|
Hive User Group Meetup NYCTuesday, Oct 23
Host: Carl Steinbach |
|
Sqoop User MeetupTuesday, Oct 23
Host: Kathleen Ting |
|
Flume User MeetupThursday, Oct 25
Host: Kathleen Ting |
|
HBase MeetupThursday, Oct 25
Host: Otis Gospodnetić |
|
Cloudera Manager Users MeetupThursday, Oct 25
Host: Philip Zeyliger |
|
ZooKeeper Users MeetupThursday, Oct 25
Host: Camille Fournier |
|
|
|
|
AWARDSNEW: STRATA DATA INOVATION AWARDSWe’re honoring innovative work in big data and data science, and need your nominations for individuals or organizations who deserve recognition for their work in data. Learn more |
|

