Apache Hadoop & Big Data Training | Master Skills in HDFS, Yarn, Map Reduce, Hive, Hbase, Flume, Oozie, Sqoop, & Pig.

Apache Hadoop & Big Data Course Description

Big Data is known as to be huge datasets that are hard to deal with traditional operational databases. It is required for parallel processing on of data on hundreds of machines. 

Hadoop is a scalable, fault-tolerant, grid operating system used for storage of data and processing. Its main components are as follows:* Commodity hardware 
* MapReduce 
* Hive, Pig 
* Open source, Apache license, etc

Apache Hadoop the open source data management software that helps organizations analyze massive volumes of structured and unstructured data. This Apache Hadoop & Big Data course enables you to use this technology and to become industry ready developer\/architect who can use Apache Hadoop with full confidence.

Apache Hadoop & Big Data Course Learning Outcomes

  • Learn basic concepts of Big Data
  • Learn concepts of Hadoop and why is it important
  • Hadoop Distributed File System (HDFS)
  • Hadoop Deployment
  • Hadoop Administration and Maintenance
  • Map-Reduce
  • Hive, Hbase, Flume, Sqoop, Oozie and Pig

Apache Hadoop & Big Data Training - Suggested Audience

This Apache Hadoop & Big Data training is intended for technology professionals with a focus on Big Data & Hadoop within their organization. Suggested attendees based on our past programs are:
  • Architects
  • System Engineers
  • IT Managers
  • Database Administrator
  • DBAs
  • BI Professionals

Apache Hadoop & Big Data Training Duration

  • Open-House F2F (Public): 4/5 days
  • In-House F2F (Private): 4/5 days, for commercials please send us an email with group size to hello@pincorps.com

Apache Hadoop & Big Data Training - Prerequisites

Some programming and database experience would be highly preferred, but not mandatory.

Apache Hadoop & Big Data training course outline includes:

1. Introduction to BigData
  •  Which data is called as BigData
  •  What are business use cases for BigData
  •  BigData requirement for traditional Data warehousing and BI space
  •  BigData solutions

2. Introduction to Hadoop
  •  The amount of data processing in today's life
  •  What Hadoop is why it is important
  •  Hadoop comparison with traditional systems
  •  Hadoop history
  •  Hadoop main components and architecture

3. Hadoop Distributed File System (HDFS)
  •  HDFS overview and design
  •  HDFS architecture
  •  HDFS file storage
  •  Component failures and recoveries
  •  Block placement
  •  Balancing the Hadoop cluster

4. Hadoop Deployment
  •  Different Hadoop deployment types
  •  Hadoop distribution options
  •  Hadoop competitors
  •  Hadoop installation procedure
  •  Distributed cluster architecture
  •  Lab: Hadoop Installation

5. Working with HDFS
  •  Ways of accessing data in HDFS
  •  Common HDFS operations and commands
  •  Different HDFS commands
  •  Internals of a file read in HDFS
  •  Data copying with 'distcp'
  •  Lab: Working with HDFS

6. Hadoop Cluster Configuration
  •  Hadoop configuration overview and important configuration file
  •  Configuration parameters and values
  •  HDFS parameters
  •  MapReduce parameters
  •  Hadoop environment setup
  •  'Include' and 'Exclude' configuration files
  •  Lab: MapReduce Performance Tuning

7. Hadoop Administration and Maintenance
  •  Namenode/Datanode directory structures and files
  •  Filesystem image and Edit log
  •  The Checkpoint Procedure
  •  Namenode failure and recovery procedure
  •  Safe Mode
  •  Metadata and Data backup
  •  Potential problems and solutions / What to look for
  •  Adding and removing nodes
  •  Lab: MapReduce Filesystem Recovery

8. Job Scheduling
  •  How to schedule Hadoop Jobs on the same cluster
  •  Default Hadoop FIFO Schedule
  •  Fair Scheduler and its configuration

9. Map-Reduce Abstraction
  •  What MapReduce is and why it is popular
  •  The Big Picture of the MapReduce
  •  MapReduce process and terminology
  •  MapReduce components failures and recoveries
  •  Working with MapReduce
  •  Lab: Working with MapReduce

10. Programming MapReduce Jobs
  •  Java MapReduce implementation
  •  Map() and Reduce() methods
  •  Java MapReduce calling code
  •  Lab: Programming Word Count

11. Input/Output Formats and Conversion Between Different Formats
  •  Default Input and Output formats
  •  Sequence File structure
  •  Sequence File Input and Output formats
  •  Sequence File access via Java API and HDS
  •  MapFile
  •  Lab: Input Format
  •  Lab: Format Conversion

12. MapReduce Features
  •  Joining Data Sets in MapReduce Jobs
  •  How to write a Map-Side Join
  •  How to write a Reduce-Side Join
  •  MapReduce Counters
  •  Built-in and user-defined counters
  •  Retrieving MapReduce counters
  •  Lab: Map-Side Join
  •  Lab: Reduce-Side Join

13. Introduction to Hive, Hbase, Flume, Sqoop, Oozie and Pig
  •  Hive as a data warehouse infrastructure
  •  Hbase as the Hadoop Database
  •  Using Pig as a scripting language for Hadoop

14. Hadoop Case studies
  •  How different organizations use Hadoop cluster in their infrastructure
Keny White


Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.


After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".


Average Rating

1 rating

Detailed Rating

5 stars
4 stars
3 stars
2 stars
1 star

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.