Greenplum Database Upgrade

Greenplum database team earlier last year started working towards building a in-place major version upgrade tool, gpupgrade. The driving force in developing less time and less space consuming upgrades was to offer an easy upgrade path for customers. This tool will enable customers to quickly and confidently upgrade to the latest version of the Greenplum database. gpupgrade will help Greenplum to have faster release cycles with faster customer feedback.

The team gathered upgrade requirements, synthesized and grouped them into different clusters of severity. After understanding the need for the tool, we built a gpupgrade prototype and performed usability testing with customers. The usability feedback gave us validation of the minimum viable product version. To get frequent and early feedback on the usability and functionality of the utility we planned a series of beta releases of the upgrade tool.

gpupgrade

The Greenplum database upgrade will upgrade data stored in Greenplum Database data files to a later Greenplum Database major version without requiring customers to have additional hardware including twice the necessary capacity. The Greenplum Database cluster upgrading from is called the source cluster, and the version upgrading to is called the target cluster.

gpupgrade support Greenplum database 5 to Greenplum database 6 upgrades. The minimum version requirement for source and target is 5.28.0 and 6.9.0 respectively. gpupgrade is a framework built upon the PostgreSQL pg_upgrade utility. pg_upgrade upgrades the data in a single Postgres instance in-place. gpupgrade runs pg_upgrade on the Greenplum master segment instance and in parallel on the primary segments.

gpupgrade Phases

gpupgrade utility has five phases in its lifecycle. Refer to Figure 1.0.

1. Pre-upgrade: Customer will be required to prepare source cluster, install gpupgrade and latest gpdb6 binaries on all hosts. Customers are advised to start the upgrade planning in the pre-upgrade phase a few weeks ahead of the downtime window to set the stage for the upgrade. Customers should also review the catalog changes and server configuration changes between two major versions to speed up the upgrade process. Data migration starts: At this point users should run the migration generator and executor to resolve catalog inconsistencies detected between two major versions of the gpdb.

2. Initialize: The upgrade utility will perform sanity checks against the source cluster to validate its state for upgrade and initialize the target cluster. During initialize disk space check and catalog consistency checks are performed on source cluster. At the end the source cluster is available and the target cluster is stopped. 

3. Execute: In the execute phase the utility will perform upgrades on primary segment instances and the master segment instance. gpupgrade also copies the data and configuration files to the target cluster. At the end of the phase the source cluster is stopped and target cluster is available to perform testing.

4. Finalize: This phase upgrades the stand-by master instance and mirror segment instances. Upgrade will also update the master port and data directory to reflect the target cluster. At the end of the phase the source cluster is stopped and the target cluster is available for validation. Data migration ends: At this point users should rebuild the dropped constraints in pre-initialize and initialize phase by running the migration executor or manually.

3. Execute: In the execute phase the utility will perform upgrades on primary segment instances and the master segment instance. gpupgrade also copies the data and configuration files to the target cluster. At the end of the phase the source cluster is stopped and target cluster is available to perform testing.

4. Finalize: This phase upgrades the stand-by master instance and mirror segment instances. Upgrade will also update the master port and data directory to reflect the target cluster. At the end of the phase the source cluster is stopped and the target cluster is available for validation.

5. Post-upgrade: In the last phase upgrade users are advised to perform validation scripts to verify the performance of the upgraded cluster. Customers are required to stop the gpupgrade hub and agent processes, and remove the gpupgrade control directory and saved source data directories. At the end of the phase, target cluster will be up and ready to be utilized.

gpupgrade Downtime

The activity performed in the Initialize, execute and finalize phase requires a downtime window. Pre-upgrade and Post-upgrade activities can be performed outside of the downtime window. Refer to figure1.0.

Figure1.0: gpupgrade Phases

Reverting an upgrade  

gpupgrade gives ability to roll back to the source cluster if need be. The roll back is achievable until the standby and mirror segments are not upgraded. Users can utilize revert functionality during/after Initialize or during/after execute. Once the user has committed to finalize then roll back functionality cannot be used. Refer to figure1.0.

gpupgrade Modes

gpupgrade upgrade can be run using two modes, Copy and link. Refer to figure 2.0 to understand the difference between two modes

.

Figure 2.0: Copy vs Link

gpupgrade availability

The upgrade utility is generally available,
Enterprise offering: https://network.pivotal.io/products/greenplum-upgrade/
Open Source offering: https://github.com/greenplum-db/gpupgrade/releases

To learn more about the utility please refer to release note and documentation, https://gpdb.docs.pivotal.io/upgrade/1-0/index.html

We encourage you to participate and contribute your valuable feedback to iterate and make the upgrade tool successful.