Earlier this year the Greenplum team embarked down the path to create the next generation backup and restore tooling for the Greenplum Database. After conducting dozens of customer interviews and reviewing a long list of enhancement requests, two overarching themes emerged:
- Performance
- User Experience
Following the core practices of Pivotal (agile, test driven development & pair programing) we began developing our Minimal Viable Product with performance and UX top of mind. The result is a set of utilities which address feedback submitted by our users and provides a solid code base enabling us to ship new features at a rapid pace.
Performance
- Optimized catalog queries for faster metadata backup
Complex customer schemas may require hours to perform just the metadata portion of the backup. We knew there was an opportunity to optimize the logic used to re-generate customer DDL from the catalog tables. The result is performance gains upwards of 7x for complex schemas. The example below is from a customer supplied schema with 21,000+ heap & AO tables, 3000 views, 1800 functions, 200 sequences and over 30,000 constraints.
- Concurrent backups to decrease overall duration
Many customers have a limited maintenance window to perform backups. We wanted to provide the ability to run concurrent backups at either the database, schema or table level utilizing as much system resources as possible. By removing exclusive pg_class locks (discussed in next section), we can accommodate multiple instances of gpbackup running at the same time. The example below is for a 2.5TB database, containing 11 schemas of 715 tables each. Performing multiple concurrent backups provided a 2x reduction in duration over a single, sequential backup.
User Experience
- No EXCLUSIVE locking of the Greenplum catalog
By wrapping the backup process into a transaction, we leverage the inherent MVCC of the Greenplum Database, without the need to take exclusive locks against catalog. This allows business processes such as ETL jobs to execute alongside gpbackup without conflicting locks taken against the pg_class catalog table.
- Improved logging and monitoring
We are providing an enhanced level of interactive monitoring, detailed object level logging and reporting of the overall backup and restore status (as seen in the short clip below). This will provide the end user the ability to better understand at what point the backup or restore is at, and more importantly how much processing remains.
Current Status
gpbackup & gprestore are shipping with Pivotal Greenplum 5.3 and 4.3.19.0 releases as Experimental Features. We encourage users to try out the new utilities and provide feedback to us as we drive towards GA in early 2018.
gpbackup & gprestore are Open Source, and can be found in the Greenplum git repo:
We welcome contributions from the Greenplum development community, including our Third Party Partners.
Additional user documentation can be found in the Greenplum Management Utility Reference Guide: