HomeOn-Demand LearningIntroduction to Apache Spark

Introduction to Apache Spark



Introduction to Apache Spark is designed to introduce you to one of the most important Big Data technologies on the market, Apache Spark. You will start by learning about some of the basic concepts behind Spark, including the Resilient Distributed Datasets which tie everything together. From there, you will learn how to work with datasets in Spark using a functional programming approach as well as SQL. Finally, you will learn how to use the Eclipse IDE to write programs to work with data, learning a common technique for deploying code for Apache Spark jobs.



BI Consultant and Trainer
As a Business Intelligence Consultant and Trainer for Pragmatic Works, Mitchell’s focus is on the full BI Stack (SSIS, SSAS and SSRS). In addition to the BI Stack, he also has experience with Data Modeling, T-SQL, MDX, Power Pivot and the Power BI Tools. Mitchell graduated from the University of North Florida in 2007 and is constantly expanding his knowledge on all things SQL Server.


What to Know Before the Class

The target audience of this course is an application or database developer interested in learning about Big Data technologies.  No knowledge of Spark or Hadoop is assumed.  Knowledge of development languages like Java, C#, or Python are helpful but not required.

Curriculum For This Course

System Requirements

Spark does not publish minimum requirements for single-node machines like VMs or laptops, but at least 8 GB of RAM is recommended.  Spark can run on any edition of Windows, Linux, or Mac OS which supports Oracle Java 1.8.

You may also like…