Overview
BDAW is a learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system.
Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow participants.
Audience & Prerequisites
To gain the most from the workshop, participants should have working knowledge of technologies such as HDFS, Spark, MapReduce, Hive/Impala, Data Formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities.
The workshop will be divided into small groups to discuss the problems and develop solutions. Each group will select a spokesperson who will present the group’s findings to the workshop. There will not be any programming labs, but we will have solutions implemented and deployed in the cloud for demos during the workshop.
Book the course
- How would you like to train?
- Classroom
- Virtual Classroom
- Private
Course Outline
Introduction
Workshop Application Use Cases
- Oz Metropolitan
- Architectural questions
- Team activity: Analyze Metroz Application Use Cases
Application Vertical Slice
- Definition
- Minimizing
risk of an unsound architecture - Selecting a vertical slice
- Team activity: Identify an initial vertical slice for Metroz
Application Processing
Real time , nearreal time processing- Batch processing
- Data access patterns
- Delivery and processing guarantees
- Machine Learning pipelines
- Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines
Application Data
- Three V’s of Big Data
- Data Lifecycle
- Data Formats
- Transforming Data
- Team activity: Metroz Data Requirements
Scalable Applications
- Scale up, scale out, scale to X
- Determining if an application will scale
- Poll: scalable airport terminal designs
- Hadoop and Spark Scalability
- Team activity: Scaling Metroz
- Principles
- Transparency
- Hardware vs. Software redundancy
- Tolerating disasters
- Stateless functional fault tolerance
- Stateful fault tolerance
- Replication and group consistency
- Fault tolerance in Spark and Map Reduce
- Application tolerance for failures
- Team activity: Identify Metroz component failures and requirements
Security and Privacy
- Principles
- Privacy
- Threats
- Technologies
- Team activity: identify threats and security mechanisms in Metroz
Deployment
- Cluster sizing and evolution
- On-premise vs. Cloud
- Edge computing
- Team activity: select deployment for Metroz
Technology Selection
- HDFS
- HBase
- Kudu
- Relational Database Management Systems
- Map Reduce
- Spark, including streaming, SparkSQL
and SparkML - Hive
- Impala
- Cloudera Search
- Data Sets and Formats
- Team activity: technologies relevant to Metroz
Software Architecture
- Architecture artifacts
- One platform or multiple, lambda architecture
- Team activity: produce
high level architecture, selected technologies, revisitvertical slice - Vertical Slice demonstration
Wrap Up
Learn more
Explore certification
Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Advance your career
Big data developers are among the world's most in-demand and highly-compensated technical roles. Check out some of the job opportunities currently listed that match the professional profile, many of which seek CCA qualification.
Private training
We also provide private training at your site, at your pace, and tailored to your needs.