Apache Spark Course Description
Apache Spark™ is a fast and general engine for large-scale data processing. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. One can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.
Pincorps Apache Spark course covers different concepts of Big Data Challenges in Big Data Processing. Approach to Big Data Problems using Apache Spark specifics like it’s Components Installation Steps RDDs Transformations Actions Lazy Execution Integration with HDFS.
Apache Spark Course Learning Outcomes;
- Understand Big Data and the challenges associated.
- Find an approach to Big Data problems with Apache Spark.
- Implement Apache Spark Concepts.
- Apply Java\/Scala for Spark.
- Follow latest emerging trends like MLib, GraphX based on Spark
Apache Spark Training – Suggested Audience
This Apache Spark training is must for anyone who aspires to embark into the field if big data and keep abreast of using Spark and related projects. Suggested attendees based on our past programs are:
- Big Data enthusiasts
- Software Architects
- Software Engineers
- Software Developers
- Data Scientists
- Analytics professionals
- Data Engineers
Apache Foundation Spark Training Prerequisites
- Fundamental knowledge of any programming language.
- Basic understanding of any database, SQL, and query language for databases.
- Working knowledge of Linux- or Unix-based systems (not mandatory).
Apache Spark In-house/Corporate Training
If you have a group of 5-6 participants, apply for in-house training. For commercials please send us an email with group size to email@example.com
|01. Introduction to Big Data, Challenges with Big Data, Batch Vs. Real Time Big Data Analytics|
|02. Batch Analytics - Hadoop Ecosystem Overview, Real Time Analytics Options|
|03. Streaming Data - Storm, In Memory Data - Spark|
|04. What is Spark?|
|05. Modes of Spark|
|06. Spark Installation Demo|
|07. Overview of Spark on a cluster|
|08. Spark Standalone Cluster|
|09. Invoking Spark Shell|
|10. Creating the SparkContext|
|11. Loading a File in Shell|
|12. Performing Some Basic Operations on Files in Spark Shell|
|13. Building a Spark Project with sbt|
|14. Running Spark Project with sbt, Caching Overview|
|15. Distributed Persistence|
|16. Spark Streaming Overview|
|17. Example: Streaming Word Count|
|19. Transformations in RDD|
|20. Actions in RDD, Loading Data in RDD|
|21. Saving Data through RDD|
|22. Key-Value Pair RDD|
|23. MapReduce and Pair RDD Operation|
|24. Java/Scala and Hadoop Integration Hands on|
|25. Why Shark?|
|26. Installing Shark|
|27. Running Shark|
|28. Loading of Data|
|29. Hive Queries through Spark|
|30. Testing Tips in Scala|
|31. Performance Tuning Tips in Spark|
|32. Shared Variables: Broadcast Variables|
|33. Shared Variables: Accumulators|
No Reviews found for this course.