The Data Incubator P...
This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of the Apache Spark platform.
The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform, SQL and other high-level data access tools, as well as Spark's streaming capabilities and machine learning APIs.
Each topic includes slide and lecture content along with hands-on use of Spark through the elegant Databricks web-based notebook environment. Inspired by tools like IPython/Jupyter and Matlab, Databricks notebooks allow attendees to code jobs, data analysis queries, and generate visualizations using their own cloud-based Spark cluster, accessed through a web browser.*
Duration: 3 Days, Full Time (9AM to 5PM)
We will have a break from noon to 1pm daily; lunch will not be provided, but there are several options nearby.
Prerequisites: Basic understanding of programming in Python or Scala. Knowledge or experience in Java, SQL can be beneficial but is not essential.
After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning job