Menu
  • LOGIN
  • No products in the cart.

Apache Hadoop & Big Data Course Description

Big Data is known as to be huge datasets that are hard to deal with traditional operational databases. It is required for parallel processing on of data on hundreds of machines. Hadoop is a scalable, fault-tolerant, grid operating system used for storage of data and processing. Its main components are as follows:

  1. Commodity hardware
  2. HDFS
  3. MapReduce
  4. Hive, Pig
  5. Open source, Apache license, etc

Apache Hadoop the open source data management software that helps organizations analyze massive volumes of structured and unstructured data. This Apache Hadoop & Big Data course enables you to use this technology and to become industry ready developer\/architect who can use Apache Hadoop with full confidence.

 

Apache Hadoop & Big Data Course Learning Outcomes;

  • Learn basic concepts of Big Data.
  • Learn concepts of Hadoop and why is it important.
  • Hadoop Distributed File System (HDFS).
  • Hadoop Deployment.
  • Hadoop Administration and Maintenance.
  • Map-Reduce.
  • Hive, Hbase, Flume, Sqoop, Oozie and Pig

 

Apache Hadoop & Big Data Training – Suggested Audience

This Apache Hadoop Big Data training is intended for technology professionals with a focus on Big Data & Hadoop within their organization. Suggested attendees based on our past programs are:

  • Architects
  • System Engineers
  • IT Managers
  • Database Administrator
  • DBAs
  • BI Professionals

 

Apache Hadoop Big Data Training – Prerequisites

Some programming and database experience would be highly preferred, but not mandatory.

 

Apache Hadoop & Big Data In-house/Corporate Training

If you have a group of 5-6 participants, apply for in-house training. For commercials please send us an email with group size to hello@pincorps.com

Course Curriculum

1. Introduction to BigData
Which data is called as BigData Details 00:00:00
What are business use cases for BigData Details 00:00:00
BigData requirement for traditional Data warehousing and BI space Details 00:00:00
BigData solutions Details 00:00:00
2. Introduction to Hadoop
The amount of data processing in today’s life Details 00:00:00
What Hadoop is why it is important Details 00:00:00
Hadoop comparison with traditional systems Details 00:00:00
Hadoop history Details 00:00:00
Hadoop main components and architecture Details 00:00:00
3. Hadoop Distributed File System (HDFS)
HDFS overview and design Details 00:00:00
HDFS architecture Details 00:00:00
HDFS file storage Details 00:00:00
Component failures and recoveries Details 00:00:00
Block placement Details 00:00:00
Balancing the Hadoop cluster Details 00:00:00
4. Hadoop Deployment
Different Hadoop deployment types Details 00:00:00
Hadoop distribution options Details 00:00:00
Hadoop competitors Details 00:00:00
Hadoop installation procedure Details 00:00:00
Distributed cluster architecture Details 00:00:00
Lab: Hadoop Installation Details 00:00:00
5. Working with HDFS
Ways of accessing data in HDFS Details 00:00:00
Common HDFS operations and commands Details 00:00:00
Different HDFS commands Details 00:00:00
Internals of a file read in HDFS Details 00:00:00
Data copying with ‘distcp’ Details 00:00:00
Lab: Working with HDFS Details 00:00:00
6. Hadoop Cluster Configuration
Hadoop configuration overview and important configuration file Details 00:00:00
Configuration parameters and values Details 00:00:00
HDFS parameters Details 00:00:00
MapReduce parameters Details 00:00:00
Hadoop environment setup Details 00:00:00
Include’ and ‘Exclude’ configuration files Details 00:00:00
Lab: MapReduce Performance Tuning Details 00:00:00
7. Hadoop Administration and Maintenance
 Namenode/Datanode directory structures and files Details 00:00:00
Filesystem image and Edit log Details 00:00:00
The Checkpoint Procedure Details 00:00:00
 Namenode failure and recovery procedure Details 00:00:00
Safe Mode Details 00:00:00
Metadata and Data backup Details 00:00:00
Potential problems and solutions / What to look for Details 00:00:00
Adding and removing nodes Details 00:00:00
Lab: MapReduce Filesystem Recovery Details 00:00:00
8. Job Scheduling
How to schedule Hadoop Jobs on the same cluster Details 00:00:00
Default Hadoop FIFO Schedule Details 00:00:00
Fair Scheduler and its configuration Details 00:00:00
9. Map-Reduce Abstraction
What MapReduce is and why it is popular Details 00:00:00
The Big Picture of the MapReduce Details 00:00:00
MapReduce process and terminology Details 00:00:00
MapReduce components failures and recoveries Details 00:00:00
Working with MapReduce Details 00:00:00
Lab: Working with MapReduce Details 00:00:00
10. Programming MapReduce Jobs
Java MapReduce implementation Details 00:00:00
Map() and Reduce() methods Details 00:00:00
Java MapReduce calling code Details 00:00:00
Lab: Programming Word Count Details 00:00:00
11. Input/Output Formats and Conversion Between Different Formats
Default Input and Output formats Details 00:00:00
Sequence File structure Details 00:00:00
Sequence File Input and Output formats Details 00:00:00
Sequence File access via Java API and HDS Details 00:00:00
MapFile Details 00:00:00
Lab: Input Format Details 00:00:00
Lab: Format Conversion Details 00:00:00
12. MapReduce Features
Joining Data Sets in MapReduce Jobs Details 00:00:00
How to write a Map-Side Join Details 00:00:00
How to write a Reduce-Side Join Details 00:00:00
MapReduce Counters Details 00:00:00
Built-in and user-defined counters Details 00:00:00
Retrieving MapReduce counters Details 00:00:00
Lab: Map-Side Join Details 00:00:00
Lab: Reduce-Side Join Details 00:00:00
13. Introduction to Hive, Hbase, Flume, Sqoop, Oozie and Pig
Hive as a data warehouse infrastructure Details 00:00:00
Hbase as the Hadoop Database Details 00:00:00
Using Pig as a scripting language for Hadoop Details 00:00:00
14. Hadoop Case studies
How different organizations use Hadoop cluster in their infrastructure Details 00:00:00

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

X