Apache Spark Training | Learn Core APIs for using Spark, Basic Internals of the Framework, SQL & Data Access Tools

Apache Spark Course Description

Apache Sparkā„¢ is a fast and general engine for large-scale data processing. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. One can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

Pincorps Apache Spark course covers different concepts of Big Data Challenges in Big Data Processing. Approach to Big Data Problems using Apache Spark specifics like it's Components Installation Steps RDDs Transformations Actions Lazy Execution Integration with HDFS.

Apache Spark Course Learning Outcomes

  • Understand Big Data and the challenges associated
  • Find an approach to Big Data problems with Apache Spark
  • Implement Apache Spark Concepts
  • Apply Java\/Scala for Spark
  • Follow latest emerging trends like MLib, GraphX based on Spark

Apache Spark - Suggested Audience

This Apache Spark training is must for anyone who aspires to embark into the field if big data and keep abreast of using Spark and related projects. Suggested attendees based on our past programs are:
  • Big Data enthusiasts 
  • Software Architects
  • Software Engineers
  • Software Developers 
  • Data Scientists
  • Analytics professionals
  • Data Engineers

Apache Spark Training Duration

  • Open-House F2F (Public): 2/3 days
  • In-House F2F (Private): 2/3 days, for commercials please send us an email with group size to hello@pincorps.com

Apache Spark Training Prerequisites

  • Fundamental knowledge of any programming language
  • Basic understanding of any database, SQL, and query language for databases
  • Working knowledge of Linux- or Unix-based systems (not mandatory)

Apache Spark training includes below listed topics:

  •  Introduction to Big Data, Challenges with Big Data, Batch Vs. Real Time Big Data Analytics
  •  Batch Analytics - Hadoop Ecosystem Overview, Real Time Analytics Options
  •  Streaming Data - Storm, In Memory Data - Spark
  •  What is Spark?
  •  Modes of Spark
  •  Spark Installation Demo
  •  Overview of Spark on a cluster
  •  Spark Standalone Cluster
  •  Invoking Spark Shell
  •  Creating the SparkContext
  •  Loading a File in Shell
  •  Performing Some Basic Operations on Files in Spark Shell
  •  Building a Spark Project with sbt
  •  Running Spark Project with sbt, Caching Overview
  •  Distributed Persistence
  •  Spark Streaming Overview
  •  Example: Streaming Word Count
  •  RDDs
  •  Transformations in RDD
  •  Actions in RDD, Loading Data in RDD
  •  Saving Data through RDD
  •  Key-Value Pair RDD
  •  MapReduce and Pair RDD Operation
  •  Java/Scala and Hadoop Integration Hands on
  •  Why Shark?
  •  Installing Shark
  •  Running Shark
  •  Loading of Data
  •  Hive Queries through Spark
  •  Testing Tips in Scala
  •  Performance Tuning Tips in Spark
  •  Shared Variables: Broadcast Variables
  •  Shared Variables: Accumulators
Keny White


Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.


After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".


Average Rating

1 rating

Detailed Rating

5 stars
4 stars
3 stars
2 stars
1 star

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.