Greenplum Database Tutorials

For use with the Greenplum Database Sandbox

Greenplum Database: Introduction and Tutorials using the Greenplum Database Sandbox VM

These tutorials showcase how Greenplum Database can address day-to-day tasks performed in typical DW, BI and data science environments. It is designed to be used with the Greenplum Database Sandbox VM that is available for download from the Pivotal Network. Both a Virtual Box, and a VMware version are available. The Virtual Box VM is in OVA format and can be IMPORTED into Virtual Box, while the VMware VM is a ZIP file that can be opened directly.

The scripts/data for this tutorial are in the gpdb-sandbox virtual machine at /home/gpadmin. The repository is pre-cloned, but will update as the VM boots in order to provide the most recent version of these instructions.

Interacting with the Sandbox via a new terminal is preferable, as it makes many of the operations simpler.

To introduce Greenplum Database, we use a public data set, the Airline On-Time Statistics and Delay Causes data set, published by the United States Department of Transportation at http://www.transtats.bts.gov/. The On-Time Performance dataset records flights by date, airline, originating airport, destination airport, and many other flight details. Data is available for flights since 1987. The exercises in this guide use data for about a million flights in 2009 and 2010. The FAA uses the data to calculate statistics such as the percent of flights that depart or arrive on time by origin, destination, and airline.

You are encouraged to review the SQL scripts in the faa directory as you work through this introduction. You can run most of the exercises by entering the commands yourself or by executing a script in the faa directory.

Tutorials

Introduction
Introduction to the Greenplum Database Architecture

Get Started
Create Users and Roles
Create and Prepare Database
Create Tables

Data Loading & Unloading
Data Loading

Performance Tuning
Queries and Performance Tuning

Analytics
Introduction to Greenplum In-Database Analytics

Administration
Backup and Recovery Operations
Importing into VMware Fusion

Project maintained by greenplum-db · Hosted on GitHub Pages — Theme by mattgraham