Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Apache Spark continues to gain momentum as the new processing paradigm for Apache Hadoop, and for the data scientist, it has a lot to like: natively distributed, REPL, Python APIs in addition to native Scala, and a library of machine learning algorithms, MLlib. Spark 1.2 includes an implementation of random decision forests, an important and popular ensemble classifier/regressor algorithm. This talk will introduce Spark, Scala and randomdecision forests to the curious, and demonstrate the process of analyzing a real-world data set with them. This quick session is target at the novice. Some previous Spark experience will help, but no data science or machine learning background is required. Attendees will: - Become familiar with Spark basics using its Scala API - Understand the decision tree and random decision forest algorithms - See a simple, narrated data science workflow in action on a real data set