Pivotal Data Science in Practice Training | Learn Fundamentals & Concepts Related to Data Science

Data Science in Practice Course Description

Managing huge chunks of data and extracting useful insights from structured and unstructured data is driving the business functionality these days, and is intended to increase exponentially in the future. 

This course is designed to give the candidates hands-on experience with the Pivotal products related to performing Pivotal Data Science projects. Given the diverse and varying nature of customer implementations, this course will focus on the main aspects of a Data Science project within Pivotal: Pivotal Greenplum, pSQL, MADlib, GPText, PivotalHD, PivotalR, pyMADlib with extra units covering Alpine Chorus and Visualization. Also practice Data Science problem-solving techniques to their respective endeavors. 

This course will introduce and use, but does not include extensive training on, pSQL, R, Python.

Data Science in Practice Course Learning Outcomes

  • Summarize the distinguishing characteristics of each Pivotal product and tool, and be able to describe the most beneficial aspects from a Data Science perspective;
  • Evaluate and demonstrate hands-on practical skills with each product and tool;
  • Investigate, assess, and apply their knowledge to practical data science problems;
  • Practice Data Science problem-solving techniques to their respective endeavors.

Data Science in Practice Training - Suggested Audience

  • Experienced data analysts and data engineers willing to work hard to achieve superior Pivotal Data Science skills
  • Individuals who want to learn about data science using the Pivotal product stack

Data Science in Practice Training Duration

  • Open-House F2F (Public): 5 days
  • In-House F2F (Private): 5/6 days, for commercials please send us an email with group size to hello@pincorps.com

Data Science in Practice Training - Prerequisites

  • Comfort with data analytic technologies a plus (Statistics, mathematics, machine learning, SQL, R, Python)
  • Have a basic understanding of virtualization and massive parallel processing concepts.

Data Science In Practice course outline incudes:

1. Introduction

2. Data science overview
  •  Data Science: The Big Picture
  •  Driving Forces
  •  What Does a Data Scientist Do
  •  The Process of Data Science
  •  What Does Pivotal bring to the Story

3. Pivotal overview
  •  Pivotal Corporate Overview
  •  The Pivotal Big Data Suite - Pivotal Greenplum DB - tPivotal GPText - tMADlib - tPivotal HD - tPivotal on Virtualized Hardware - tPivotal HAWQ - tPivotal eXtension Framework (PXF) - tPivotal Analytics Workbench - tPivotal GemFire - tPivotal GemFireXD - tSpring by Pivotal - tSpring XD - tPivotal Labs and Pivotal Data Labs -

4. Pivotal greenplum DB review including inline labs
  •  Essentials
  •  Getting Started and Inline Lab Exercise
  •  Intro to pSQL and Inline Lab Exercises - Creating Tables - tDistributions and Partitioning - tIndexes - tExternal Tables and Loading Data -
  •  Unloading Data
  •  Analyze
  •  Explain and Analyze
  •  Vacuum
  •  Monitoring

5. Advanced SQL
  •  Explore and Inline Lab Exercise
  •  Joins and Inline Lab Exercise
  •  Arrays and Array Aggregates and Inline Lab Exercise
  •  Window Functions and Inline Lab Exercise
  •  Other Functions and Inline Lab Exercise
  •  User Defined Functions (UDF's)
  •  User Defined Aggregates (UDA's)
  •  Data Science Exercise

6. MADLIB including inline labs
  •  MADlib Basics
  •  Advanced MADlib
  •  Data Science Exercise

7. TEXT including inline labs
  •  NLP: Practical Examples
  •  NLP: Practical Examples with NLTK
  •  Putting it all together
  •  Data Science Exercise

8. Apache hadoop and the hadoop ecosystem including inline labs
  •  Apache Hadoop Overview
  •  - Core Component: HDFS
  •  - Core Component: MapReduce
  •  - Map Reduce: Writing a Job
  •  Hadoop Ecosystem
  •  - Hadoop Streaming
  •  - Pig

9. Pivotal HD and HAWQ including inline labs
  •  Intro to Pivotal HD and HAWQ
  •  Getting Started with HAWQ
  •  Working with HAWQ
  •  External Tables: file, gpfdist, web
  •  External Tables: PXF
  •  Loading and Unloading Data and Inline Lab Exercises
  •  - Loading and Unloading using Copy
  •  - Loading and Unloading using Insert
  •  - Loading and Unloading using gpfdist / gpload / external tables
  •  Data Science Exercise

10. R and python
  •  PivotalR
  •  PL/R
  •  pyMADlib
  •  PL/Python
  •  Data Science Exercise

11. Visualization
  •  Tableau
  •  R
  •  Python
  •  Exercises

12. HAWQ text analytics exercise airline price optimization exercise gene sequencing exercise
  •  HAWQ Text Analytics Exercise
  •  Airline Price Optimization Exercise
  •  Gene Sequencing Exercise
Keny White


Keny White is Professor of the Department of Computer Science at Boston University, where he has been since 2004. He also currently serves as Chief Scientist of Guavus, Inc. During 2003-2004 he was a Visiting Associate Professor at the Laboratoire d'Infomatique de Paris VI (LIP6). He received a B.S. from Cornell University in 1992, and an M.S. from the State University of New York at Buffalo.


After working as a software developer and contractor for over 8 years for a whole bunch of companies including ABX, Proit, SACC and AT&T in the US, He decided to work full-time as a private software trainer. He received his Ph.D. in Computer Science from the University of Rochester in 2001. "What I teach varies from beginner to advanced and from what I have seen, anybody can learn and grow from my courses".


Average Rating

1 rating

Detailed Rating

5 stars
4 stars
3 stars
2 stars
1 star

    This is great

    I really love the course editor in LearnPress. It is never easier when creating courses, lessons, quizzes with this one. It's the most useful LMS WordPress plugin I have ever used. Thank a lot! Testing quiz is funny, I like the sorting choice question type most.