Apache Spark & Scala Training | Get hands-on understanding to create Spark applications using Scala programming.

Apache Spark & Scala Course Description

Apache Spark, a fast, in-memory distributed collections framework written in Scala. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster.

Pincorps Apache Spark and Scala training module teach you to create applications in Spark with the implementation of Scala programming. It provides a clear comparison between Spark and Hadoop and covers techniques to increasing your application performance and enabling high-speed processing.

Learners will master Scala programming and will get trained on different APIs which Spark offers such as Spark Streaming, Spark SQL, Spark RDD, Spark MLlib and Spark GraphX.

Apache Spark & Scala Learning Outcomes

  • Understand Scala and its implementation
  • Apply Lazy values, Control Structures, Loops, Collection, etc.
  • Learn the concepts of Traits and OOPS in scala
  • Understand Functional programming in scala
  • Get an insight into the BigData challenges
  • How spark acts as a solution to these challenges
  • Install spark and implement spark operations on spark shell
  • Understand what are RDDs in spark
  • Implement spark application on YARN (Hadoop)
  • Analyze Hive and Spark SQL Architecture

Apache Spark & Scala Training - Suggested Audience

This Apache Spark & Scala training is aimed at professionals with little bit knowledge on functional programming language and object oriented programming. Suggested attendees based on our past programs are:
  • Big Data enthusiasts 
  • Software Architects
  • Software Engineers
  • Software Developers 
  • Data Scientists
  • Data Engineers
  • Analysts
  • ETL Developers

Apache Spark & Scala Training Duration

  • Open-House F2F (Public): 4/5 days
  • In-House F2F (Private): 4/5 days, for commercials please send us an email with group size to hello@pincorps.com

Apache Spark & Scala Training Prerequisites

  • Basic familiarity with Linux or Unix
  • A basic understanding of functional programming and object oriented programming
  • Intermediate-level of knowledge on Hadoop
  • Knowledge of Scala will definitely be a plus but is not mandatory.

Apache Spark & Scala training course modules includes:

Module-1: Introduction to Spark and Analysis
  •  Why second generation frameworks?
  •  Introduction to Spark
  •  Scala shell
  •  Spark Architecture
  •  Spark on Cluster
  •  Spark Core
  •  SparkSQL
  •  Spark Streaming
  •  Cluster Managers
  •  Spark Users
  •  What is use of Spark
  •  Spark Versions
  •  Spark Storage Layers
  •  Download Spark

  1. Spark API on a Cluster
  • Why second generation frameworks?
  • The Driver
  • Executors
  • Execution components: jobs, tasks, stages
  • Spark Web UI

  1. Cluster Manager
  • Standalone Cluster Manager
  • Hadoop YARN
  • Apache Mesos
  • Amazon EC2
  • Which Cluster Manager?
  • Spark-submit for deploying applications
  • Using MAVEN for JAVA SPARK application
  • Using SBT for A SCALA Application

Module-2: DATALOADING (HDFS, Amazon s3)
  • Different file formats:
  1.  Text files
  2.  Json
  3.  Comma ,tab separated values
  4.  Object files
  5.  Sequence files
  6.  Input /output formats
  7.  SPARKSQL for Structured data

Module-3: RDD’S
  •  What is RDD
  •  Why RDD?
  •  RDD operations
  •  Transformations
  •  Actions
  •  Lazy Evaluation
  •  Basic RDD’s
  •  Caching
  •  Converting between RDD types
  •  Spark Api supports Python, Java, Scala
  •  Working with Key, value pairs
  •  Create key, value pair RDD’s

  1. Transformations on pair RDD’s
  •  Aggregations
  •  Grouping data
  •  Joins
  •  Sorting data

  1. Actions on pair RDD’s
  •  RDD’s partitioner
  •  Operations from partitioning
  •  Page Rank example

      3. Advanced Spark operation
  •  Aggregate
  •  Fold
  •  Map partitions
  •  Glom
  •  Accumulators
  •  Broadcasters
  •  Anatomy of a spark RDD
  •  Splits
  •  Localization
  •  Serialization
  •  Transformations Vs. Actions

Module-4: SPARK SQL
  •  Spark sql in applications
  •  Spark sql initialization
  •  Spark sql basic query
  •  Schema RDD’s
  •  Caching
  •  Load data from hive
  •  Load data from json
  •  Load data from RDD’s
  •  Beeline
  •  Long-lived tables and queries
  •  Query hands-on
  •  Spark sql UDF’s
  •  Performance

  •  Streaming Architecture
  •  Two types of Transformations
      1. Stateless Transformations
      2. Stateful Transformations
  •  Streaming UI
  •  Sources: Input
  •  Core Sources
  •  Additional Sources
  •  Multiple Sources
  •  Cluster Sizing

  1. Fault Tolerance
  •  Driver Fault Tolerance
  •  Worker Fault Tolerance
  •  Receiver Fault Tolerance
  •  Operation 24/7
  •  Performance
  •  Garbage collection
  •  Memory Usage
Keny White


Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.


After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".


Average Rating

1 rating

Detailed Rating

5 stars
4 stars
3 stars
2 stars
1 star

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.