Greenplum 6, Devevelopment Updates, Jan 2018

Greenplum v5 launched in September 2017 and the Greenplum developers have been hard at work since then on the next major version, V6, Code Name Mars, which is slated to release September 2018. In this post I will provide some high level updates on new developments on the V6 code line.

    1. PostgreSQL 8.4 merge has been completed.  Greenplum v5 was based on 8.3, and now 6 has the complete 8.4 base.  This is a great milestone but not the last milestone before the GP6 release as we expect to reach 9.x in this cycle.  My favorite 8.4 feature is Column Level Permissions.
    2. WAL Replication replaces File Replication.  This has been completed in the 6.0 branch and is a HUGE milestone.  File Replication came into Greenplum in 4.0.0.0 in 2010 and introduced what was at the time, state of the art High Availability into Greenplum.  FileReplication was a massive feature that had 100,000 person years put into development.  WAL Replication has matured and improved in the meantime, and I would expect the uptime (# of 9s) for large mission critical clusters will go UP with GP6 due to the infrastructure in WAL Replication being more robust and more capable.  This is also the foundation for future features around disaster recovery and snapshotting.  And finally, WAL Replication replacing FileReplication will accellerate our future PostgreSQL code merges and staying in synch with PostgreSQL because the differences between PG and GP will be vastly reduced.
    3. ZStandard Compression for Append Optimized Tables was contributed by the Arena Data team in Russia.  This is a new algorithm that has dramatically less CPU utilization and increased performance for compression.  Improved compression is like money in the bank, because you can do more data processing and storage with the same amount of hardware.  Really happy to see this improvement
    4. GIN Indices have been enabled.  Previous versions of Greenplum DB did not enable GIN indices due to complications in the mirroring of them.  Now that we have standard postgresql mirroring in 6, we can enable this, and it has been merged and enable in 6 this week.  Here are some selected blogs highlighting what can be done with GIN: 1, 2, 3.
    5. Replacement of gpcrondump with gpbackup.  gpbackup improves on gpcrondump in many respects, the most popular being reduced lock contention.  The lock contention is reduced because the gpbackup design acts as a regular SQL read only user to the database and uses a transaction to get a point in time, so no heavy handed system locking is required during the job.
    6. Improved concurrency by reducing lock contention is submitted in PR, and seems to have approvals for merge, but is not yet merged.  There is quite a bit of concurrency performance work going in development and GP6 should be the highest in terms of concurrency benchmarks we have ever had.

    There is still quite a bit of time before we cut 6.0 so its great to see so much completed work already in the next upcoming version!  I will provide another update as we see more get merged in.