Greenplum Database® is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.
Greenplum Database® project is released under the Apache 2 license. We want to thank all our current community contributors and are interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. To ensure that the use of the Greenplum Database® trademarks and graphics marks will not lead to confusion please follow the Greenplum Database trademark guidelines.
The Greenplum Database architecture provides automatic parallelization of all data and queries.
High-performance loading uses MPP technology. Loading speeds scale with each additional node to greater than 10 terabytes per hour, per rack.
The query optimizer available in Greenplum Database is the industry’s first cost-based query optimizer for big data workloads. It can scale interactive and batch mode analytics to large datasets in the petabytes without degrading query performance and throughput.
The table or partition storage, execution, and compression settings can be configured to suit the way data is accessed. Users have the choice of row or column-oriented storage and processing for any table or partition.
Provided by Apache MADlib, a library for scalable in-database analytics extending the SQL capabilities of Greenplum Database through user-defined functions.
There are a number of distributions to choose from to get started with Greenplum Database.
Pivotal offers a fully supported commercial distribution of Greenplum Database with some proprietary addons, both for local download as well as cloud based.
Bigtop is an Apache project providing packages for data science and big data tools, Greenplum Database is starting with 5.0.0-alpha supported by Bigtop.
There are also pre-built Docker images with Greenplum Database.
Each community release is available as a source code tarball which can be compiled on a number of platforms.
Do you know any other distributions? Let us know!
Greenplum Database is the first massively parallel open source data warehouse. It is forever changing the data warehouse market and we welcome all contributors that want to be part of this change. Below are all the ways you can get involved with Greenplum. Development contributions are encouraged but you don't have to be a developer; there are many ways to get involved with Greenplum.
Use the email@example.com mailing list to share any kind of questions related to installation, configuration, usage, product documentation or any other area you might need help with. Feel free to send us your links to blogs and presentations so we can highlight them on greenplum.org. Alternatively you can also be a part of the Greenplum discussions on Stack Overflow.
Do you have an idea for a new feature or bug fix for Greenplum? Please discuss in the firstname.lastname@example.org mailing list or make pull requests on Github.
Are you a Greenplum expert? Want to share your knowledge with others? We are a collaborative community that shares best practices.
Write an email to email@example.com.
Apache MADlib is a SQL-based advanced analytics and machine learning library that works with the Greenplum database.
We are committed to keeping Greenplum code base clean and defect free, especially when those defects can be caught early on via automated Continuous Integration tools. There are two main CI servers you should be aware of when working on Greenplum:
Travis CI runs a simple build on all your Github pull requests and doesn't really do much else. It provides a minimum level of assurance that a change submitted to Greenplum codebase won't break the build, but it tells you nothing about different platforms and testing. That's where Concourse pipelines come in. They provide a much more sophisticated level of testing and validation. In fact, the other verification for each PR comes from Concourse CI. Note that each PR generates a separate build and you find which PR is associated which what build here.
Greenplum Database has an active community which discuss usage, and development, of Greenplum on the official mailing lists. If you have a question you can search the archives to see if it has been discussed already.
Mailing list for all major Greenplum product announcements. This includes every new release and any other critical announcements.
Mailing list for the Greenplum user community.
Mailing list for discussions about development of Greenplum Database. Additionally, there is also a lot of discussion in the Github issues and pull-requests threads. Watching the repository is a good way to see all the discussions.
Mailing list to receive Greenplum and GPOrca commit notifications.
Mailing list for all Greenplum related jobs. List is open to anyone that wants to announce available positions.
Mailing list for our modular query optimizer for big data.