Apache PIG Training | Learn Data Analysis with Apache PIG & Infrastructure for Evaluating these Programs

Apache PIG Course Description

Apache PIG scripting platform for processing & analyzing large data sets. Apache PIG interacts with data stored in the cluster with YARN as the architectural center of Apache Hadoop. Apache Pig allows Apache Hadoop users to write complex MapReduce transformations using a simple scripting language called Pig Latin. Pig translates the Pig Latin script into MapReduce so that it can be executed within YARN for access to a single dataset stored in the Hadoop Distributed File System (HDFS).

At present Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs for which large-scale parallel implementations already exist. Pig's language layer currently consists of a textual language called Pig Latin.

Pig was designed for performing a long series of data operations, making it ideal for three categories of Big Data jobs:
  • Extract-transform-load (ETL) data pipelines.
  • Research on raw data.
  • Iterative data processing.

With Pincorps Apache PIG course, you will be learning about Apache PIG in details. You will learn pg basics, installation, pig script, loading and storage, debugging, grunt shell etc.

Apache PIG Course Learning Outcomes

  • The fundamentals concepts of Big Data and Hadoop
  • What is Apache PIG and its Use Cases
  • How to set up PIG in Local and MapReduce mode
  • What is PIG Latin Language
  • Working and implementation of PIG Latin Statements
  • How to create a directory and different ways of inserting the Data
  • PIG Latin operators and the supported data types
  • Concepts of PIG Streaming
  • How to write and execute PIG Scripts
  • Built-in Functions and User Defined Functions
  • The structuring of PIG scripts and how they are executed
  • How to write PIG Macros and perform Parameter substitution
  • The use of PIG's Shell and Utility Commands to run your programs
  • How to compress files(input/output/intermediate results)
  • Testing and Diagnostics tools to examine and/or debug your programs

Apache PIG Training - Suggested Audience

This Apache PIG training is intended for developers who need t create applications for Hadoop 2.0. Suggested attendees based on our past programs are:
  • Software Developers
  • Hadoop Developers
  • Data Analyst

Apache PIG Training Duration

  • Open-House F2F (Public): 2/3 days
  • In-House F2F (Private): 2/3 days, for commercials please send us an email with group size to hello@pincorps.com

Apache PIG Training Prerequisites

  • Basic familiarity with SQL and\/or a scripting language
  • Essential knowledge of Linux will help in understanding linux commands in the tutorial.
  • No pre-existing knowledge of Hadoop is required

System Setup Requirements

  • Latest stable build of Hadoop of around 1.0.3
  • Install Hadoop & the machine should have Java 1.6 installed.
  • Pig tutorial assumes that users have Linux/Mac OS X. If the Windows is being used, Cygwin should be installed. Shell support in addition to required software is needed.

Apache PIG training course syllabus includes:

Module1 - Introduction to Course
  • Introduction
  • Prerequisites for PIG
  • Use Cases of PIG

Module2 - Getting Started with Apache PIG
  • What is Big Data and Hadoop?
  • Hadoop MapReduce
  • What is Apache PIG?
  • PIG vs. MapReduce
  • Where to use PIG, where not!!
  • PIG’s History

Module3 - Pig Latin Language and its Statement
  • PIG Latin Language
  • Running PIG in Different Modes
  • PIG Architecture
  • PIG Latin Statements

Module4 - PIG Model and Operators
  • PIG’s Data Model
  • Arithmetic and Boolean Operators
  • Cast and Comparison Operators
  • Relational Operators
  • PIG Streaming

Module5 - PIG Built-in Functions
  • Eval Functions
  • Load and Store Functions
  • Tuple and Bag Functions

Module6 - PIG Scripts and UDF’s
  • Create and Run PIG Scripts
  • Writing JAVA UDF’s

Module7 - Control Structures
  • Embedded PIG in JAVA
  • PIG Macros
  • Parameter Substitution

Module8 - Shell and Utility Commands
  • Shell Commands
  • Utility Commands

Module9 - Compression with PIG
  • Compressed Files
  • Compress the Results of Intermediate Jobs

Module10 - Testing and Diagnostics
  • Diagnostic Operators
  • PIGUnit
Keny White

Professor

Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.

Bachelor

After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".

Reviews

Average Rating

5
1 rating

Detailed Rating

5 stars
1
4 stars
0
3 stars
0
2 stars
0
1 star
0

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.