The World’s First Open Source Massively Parallel Data Warehouse

Greenplum Database® is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

Greenplum Database® project is released under the Apache 2 license. We want to thank all our current community contributors and are interested in all new potential contributions. For the Greenplum Database community, no contribution is too small, we encourage all types of contributions. To ensure that the use of the Greenplum Database® trademarks and graphics marks will not lead to confusion please follow the Greenplum Database trademark guidelines.

Originally based on PostgreSQL, Greenplum Database has added a significant number of data warehouse innovations.

Massively Parallel Processing Architecture

The Greenplum Database architecture provides automatic parallelization of all data and queries.

 Petabyte-Scale Loading

High-performance loading uses MPP technology. Loading speeds scale with each additional node to greater than 10 terabytes per hour, per rack.

Innovative Query Optimizer

The query optimizer available in Greenplum Database is the industry’s first cost-based query optimizer for big data workloads. It can scale interactive and batch mode analytics to large datasets in the petabytes without degrading query performance and throughput.

Polymorphic Data Storage and Execution

The table or partition storage, execution, and compression settings can be configured to suit the way data is accessed. Users have the choice of row or column-oriented storage and processing for any table or partition.

 Advanced Machine Learning

Provided by Apache MADlib, a library for scalable in-database analytics extending the SQL capabilities of Greenplum Database through user-defined functions.

Greenplum Database Continuous Integration

We are committed to keeping Greenplum code base clean and defect free, especially when those defects can be caught early on via automated Continuous Integration tools. There are two main CI servers you should be aware of when working on Greenplum:

Travis CI ·
Concourse CI

Travis CI runs a simple build on all your Github pull requests and doesn’t really do much else. It provides a minimum level of assurance that a change submitted to Greenplum codebase won’t break the build, but it tells you nothing about different platforms and testing. That’s where Concourse pipelines come in. They provide a much more sophisticated level of testing and validation. In fact, the other verification for each PR comes from Concourse CI. Note that each PR generates a separate build and you find which PR is associated which what build here.

While not strictly speaking required, sometimes knowing the internals of both Travis CI
and Concourse CI can come in handy.

DBA Tools:

Memory Calculator
Plan Checker