The World’s First Open-Source & Massively Parallel Data Platform

Greenplum Database® is an advanced, fully featured, open source data platform.  It provides powerful and rapid analytics on petabyte scale data volumes.  Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.The Greenplum Database® project is released under the Apache 2 license.  We want to thank all our current community contributors and all who are interested in new contributions.  For the Greenplum Database community, no contribution is too small, we encourage all types of contributions.  To ensure that the use of the Greenplum Database® trademarks and graphics marks will not lead to confusion, please follow the Greenplum Database trademark guidelines.

Based on PostgreSQL, Greenplum Database has added a significant number of parallel analytic innovations.

MASSIVELY PARALLEL PROCESSING ARCHITECTURE
The Greenplum Database architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.

PETABYTE-SCALE LOADING
High-performance loading uses MPP technology. Loading speeds scale with each additional node to greater than 10 terabytes per hour, per rack.

INNOVATIVE QUERY OPTIMIZER
The query optimizer available in Greenplum Database is the industry’s first cost-based query optimizer for big data workloads. It can scale interactive and batch mode analytics to large datasets in the petabytes without degrading query performance and throughput.

POLYMORPHIC DATA STORAGE AND EXECUTION
The table or partition storage, execution, and compression settings can be configured to suit the way data is accessed. Users have the choice of row or column-oriented storage and processing for any table or partition.

ADVANCED MACHINE LEARNING
Provided by Apache MADlib, a library for scalable in-database analytics extending the SQL capabilities of Greenplum Database through user-defined functions.

EXTERNAL DATA ACCESSIBILITY
Access and query all your data through the external table syntax. Traditional on-premises and next-generation public data lakes supported.