Hadoop Administration Training | Become an expert Apache Hadoop Administrator

Hadoop Administration Course Description

Hadoop Administration training for System Administrators is designed for technical operations personnel whose job it is to install and maintain production Hadoop clusters in the real world. We will cover Hadoop architecture and its components, the installation process, monitoring and troubleshooting of complex Hadoop issues. The Hadoop admin training is focused on practical hands-on exercises and encourages open discussions of how people are using Hadoop in enterprises dealing with large data sets.

Hadoop Administration Course Learning Outcomes

  • Understand Hadoop main components and Architecture
  • Be comfortable working with Hadoop Distributed File System
  • Understand MapReduce abstraction and how it works
  • Plan your Hadoop cluster
  • Deploy and administer Hadoop cluster
  • Optimize Hadoop cluster for the best performance based on specific job requirements
  • Monitor a Hadoop cluster and execute routine administration procedures
  • Deal with Hadoop component failures and recoveries
  • Get familiar with related Hadoop projects: Hbase, Hive and Pig
  • Know best practices of using Hadoop in an enterprise world

Hadoop Administration Training - Suggested Audience

This training is aimed at professionals who are working or dealing with Hadoop. Suggested attendees based on our past programs are:
  • System Administrators
  • Support Engineers
  • IT Managers
  • IT administrators
  • IT Systems Engineer
  • Data Engineer
  • Database Administrator
  • Data Analytics Professionals
  • Cloud Systems Administrator
  • Web Engineer

Hadoop Administration Training Duration

  • Open-House F2F (Public): 3/4 days
  • In-House F2F (Private): 3/4 days, for commercials please send us an email with group size to hello@pincorps.com

Hadoop Administration Training  - Prerequisites

No as such mandatory prerequisites to attend this training.

Hadoop Administrator training course outline includes:

1. Introduction to Hadoop
  • The amount of data processing in today’s life
  • What Hadoop is and why it is important?
  • Hadoop comparison with traditional systems
  • Hadoop history
  • Hadoop main components and architecture

2. Hadoop Distributed File System (HDFS)
  • HDFS overview and design
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster

3. Planning your Hadoop cluster
  • Planning a Hadoop cluster and its capacity
  • Hadoop software and hardware configuration
  • HDFS block replication and rack awareness
  • Network topology for Hadoop cluster

4. Hadoop Deployment
  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors
  • Hadoop installation procedure
  • Distributed cluster architecture
  • Lab: Hadoop Installation

5. Ways of accessing data in HDFS
  • Common HDFS operations and commands
  • Different HDFS commands
  • Internals of a file read in HDFS
  • Data copying with ‘distcp’
  • Lab: Working with HDFS

6. Map-Reduce Abstraction
  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce
  • MapReduce process and terminology
  • MapReduce components failures and recoveries
  • Working with MapReduce
  • Hadoop Cluster Configuration

7. Hadoop configuration overview and important configuration file
  • Configuration parameters and values
  • HDFS parameters MapReduce parameters
  • Hadoop environment setup
  • ‘Include’ and ‘Exclude’ configuration files
  • Lab: MapReduce Performance Tuning

8. Hadoop Administration and Maintenance
  • Namenode/Datanode directory structures and files
  • File system image and Edit log
  • The Checkpoint Procedure
  • Namenode failure and recovery procedure
  • Safe Mode
  • Metadata and Data backup
  • Potential problems and solutions / what to look for
  • Adding and removing nodes
  • Lab: MapReduce File system Recovery

9. Hadoop Monitoring and Troubleshooting
  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster

10. Job Scheduling
  • How to schedule Hadoop Jobs on the same cluster
  • Default Hadoop FIFO Schedule
  • Fair Scheduler and its configuration

11. Hadoop Multi-Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2

12. Hadoop Multi-Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
Keny White

Professor

Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.

Bachelor

After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".

Reviews

Average Rating

5
1 rating

Detailed Rating

5 stars
1
4 stars
0
3 stars
0
2 stars
0
1 star
0

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.