Menu
  • LOGIN
  • No products in the cart.

Apache Spark Course Description

Apache Spark™ is a fast and general engine for large-scale data processing. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. One can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

Pincorps Apache Spark course covers different concepts of Big Data Challenges in Big Data Processing. Approach to Big Data Problems using Apache Spark specifics like it’s Components Installation Steps RDDs Transformations Actions Lazy Execution Integration with HDFS.

 

Apache Spark Course Learning Outcomes;

  • Understand Big Data and the challenges associated.
  • Find an approach to Big Data problems with Apache Spark.
  • Implement Apache Spark Concepts.
  • Apply Java\/Scala for Spark.
  • Follow latest emerging trends like MLib, GraphX based on Spark

 

Apache Spark Training – Suggested Audience

This Apache Spark training is must for anyone who aspires to embark into the field if big data and keep abreast of using Spark and related projects. Suggested attendees based on our past programs are:

  • Big Data enthusiasts
  • Software Architects
  • Software Engineers
  • Software Developers
  • Data Scientists
  • Analytics professionals
  • Data Engineers

 

Apache Foundation Spark Training Prerequisites

  • Fundamental knowledge of any programming language.
  • Basic understanding of any database, SQL, and query language for databases.
  • Working knowledge of Linux- or Unix-based systems (not mandatory).

 

Apache Spark In-house/Corporate Training

If you have a group of 5-6 participants, apply for in-house training. For commercials please send us an email with group size to hello@pincorps.com

Course Curriculum

01. Introduction to Big Data, Challenges with Big Data, Batch Vs. Real Time Big Data Analytics
02. Batch Analytics - Hadoop Ecosystem Overview, Real Time Analytics Options
03. Streaming Data - Storm, In Memory Data - Spark
04. What is Spark?
05. Modes of Spark
06. Spark Installation Demo
07. Overview of Spark on a cluster
08. Spark Standalone Cluster
09. Invoking Spark Shell
10. Creating the SparkContext
11. Loading a File in Shell
12. Performing Some Basic Operations on Files in Spark Shell
13. Building a Spark Project with sbt
14. Running Spark Project with sbt, Caching Overview
15. Distributed Persistence
16. Spark Streaming Overview
17. Example: Streaming Word Count
18. RDDs
19. Transformations in RDD
20. Actions in RDD, Loading Data in RDD
21. Saving Data through RDD
22. Key-Value Pair RDD
23. MapReduce and Pair RDD Operation
24. Java/Scala and Hadoop Integration Hands on
25. Why Shark?
26. Installing Shark
27. Running Shark
28. Loading of Data
29. Hive Queries through Spark
30. Testing Tips in Scala
31. Performance Tuning Tips in Spark
32. Shared Variables: Broadcast Variables
33. Shared Variables: Accumulators

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

X