Recently I had the opportunity to demonstrate the ability to use Amazon S3 as the landing spot for backups of Greenplum. I thought that the steps involved in creating incremental backups of Greenplum to S3 would be of interest to many. Greenplum backups can be performed to any folder on a mount / filesystem as well as other places such as S3 or a DellEMC Data Domain device using out of the box plug-ins. The steps below assume that the Greenplum master hostname is MDW. All the work is carried from that master host (MDW).
Step 1 – Create New Database: dev
$ createdb dev
Step 2: Download Test Data
Download the test data tarfile from: https://sppde-gpbackup-test.s3.amazonaws.com/testdata.tgz and untar it in /home/gpadmin on Masternode (MDW)
Step 3: Create & Load Demo Schema
Let’s now create and load a demo schema with some data. For this demonstration I use a retail demo with a size of 256GB.
$ cd /home/gpadmin/testdata/retail_demo/
$ psql -d deev -f create_retail_demo_schema.sql
$ ./loaddata_to_gp.sh
[gpadmin@mdw ~]$ psql -c "select * from gp_toolkit.gp_size_of_schema_disk ;"
sosdnsp | sosdschematablesize | sosdschemaidxsize
-------------+---------------------+------------------
retail_demo | 372926784 | 0
public | 3833856 | 819200
madlib | 32768 | 0
Step 4: Install gpbackup Plugin
Download and install pivotal_greenplum_backup_restore-1.17.0-1-gp6-rhel-x86_64.gppkg from https://network.pivotal.io/
Step 5: Configure S3 Plugin
We can use either S3 or DellEMC Data Domain plugins or we can use a file mount to store our backups. For this demo we are using the S3 plug-in.
Create a new file /home/gpadmin/s3-test-config.yaml with the below configuration. Please make sure the indentation follow yaml syntax:
executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
region: us-east-1
aws_access_key_id: my_s3_access_key
aws_secret_access_key: my_s3_secret_key
bucket: sppde-gpbackup-test
folder: dev/backup1
Step 6: Run First Full Backup
$ gpbackup --dbname dev --leaf-partition-data --plugin-config /home/gpadmin/s3-test-config.yaml
20200418:11:01:11 gpbackup:gpadmin:mdw:003259-[INFO]:-Starting backup of database dev
20200418:11:01:11 gpbackup:gpadmin:mdw:003259-[INFO]:-Plugin config path: /tmp/20200418110111_s3-test-config.yaml
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Backup Timestamp = 20200418110111
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Backup Database = dev
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Gathering table state information
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Acquiring ACCESS SHARE locks on tables
Locks acquired: 11 / 11 [==============================================================] 100.00% 0s
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Gathering additional table metadata
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Getting partition definitions
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Getting storage information
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Getting child partitions with altered schema
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Metadata will be written to /data1/master/gpseg-1/backups/20200418/20200418110111/gpbackup_20200418110111_metadata.sql
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Writing global database metadata
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Global database metadata backup complete
20200418:11:01:12 gpbackup:gpadmin:mdw:003259-[INFO]:-Writing pre-data metadata
20200418:11:01:13 gpbackup:gpadmin:mdw:003259-[INFO]:-Pre-data metadata backup complete
20200418:11:01:13 gpbackup:gpadmin:mdw:003259-[INFO]:-Writing post-data metadata
20200418:11:01:13 gpbackup:gpadmin:mdw:003259-[INFO]:-Post-data metadata backup complete
20200418:11:01:13 gpbackup:gpadmin:mdw:003259-[INFO]:-Writing data to file
Tables backed up: 11 / 11 [============================================================] 100.00% 5s
20200418:11:01:18 gpbackup:gpadmin:mdw:003259-[INFO]:-Data backup complete
20200418:11:01:20 gpbackup:gpadmin:mdw:003259-[INFO]:-Found neither /usr/local/greenplum-db/./bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20200418:11:01:20 gpbackup:gpadmin:mdw:003259-[INFO]:-Email containing gpbackup report /data1/master/gpseg-1/backups/20200418/20200418110111/gpbackup_20200418110111_report will not be sent
20200418:11:01:20 gpbackup:gpadmin:mdw:003259-[INFO]:-Backup completed successfully
Step 7: Verify Bucket Size and Files
Under the dev folder in the S3 bucket the below structure is created. The backup folder is in the format ‘../dev/backups/ddmmyyy/ddmmyyyyhhmmss/’
$ aws s3 ls sppde-gpbackup-test/dev/
PRE backups/
$ aws s3 ls sppde-gpbackup-test/dev/backups/
PRE 20200418/
$ aws s3 ls sppde-gpbackup-test/dev/backups/20200418/
PRE 20200418112456/
$ aws s3 ls sppde-gpbackup-test/dev/ --recursive --human-readable --summarize
you should see the below output summary at the end:
Total Objects: 137
Total Size: 255.5 MiB
Step 8: Add New Schema and Load Data
On the master node, cd /home/gpadmin/testdata/faa and execute the script run.sh.
Once this is finished, you will now have added a new schema and loaded data.
Verify the sizes:
[gpadmin@mdw ~]$ psql -c "select * from gp_toolkit.gp_size_of_schema_disk;"
sosdnsp | sosdschematablesize | sosdschemaidxsize
-------------+---------------------+---------------------------
faa | 822264480 | 0
retail_demo | 372926784 | 0
public | 3833856 | 819200
madlib | 32768 | 0
Step 9: Run Incremental Backup
$gpbackup --dbname dev --incremental --leaf-partition-data --plugin-config /home/gpadmin/s3-test-config.yaml
20200418:11:51:05 gpbackup:gpadmin:mdw:005876-[INFO]:-Starting backup of database dev
20200418:11:51:05 gpbackup:gpadmin:mdw:005876-[INFO]:-Plugin config path: /tmp/20200418115105_s3-test-config.yaml
20200418:11:51:06 gpbackup:gpadmin:mdw:005876-[INFO]:-Backup Timestamp = 20200418115105
20200418:11:51:06 gpbackup:gpadmin:mdw:005876-[INFO]:-Backup Database = dev
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Gathering table state information
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Acquiring ACCESS SHARE locks on tables
Locks acquired: 40 / 40 [==============================================================] 100.00% 0s
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Gathering additional table metadata
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Getting partition definitions
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Getting storage information
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Getting child partitions with altered schema
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Metadata will be written to /data1/master/gpseg-1/backups/20200418/20200418115105/gpbackup_20200418115105_metadata.sql
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Writing global database metadata
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Global database metadata backup complete
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Writing pre-data metadata
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Pre-data metadata backup complete
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Writing post-data metadata
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Post-data metadata backup complete
20200418:11:51:07 gpbackup:gpadmin:mdw:005876-[INFO]:-Basing incremental backup off of backup with timestamp = 20200418112456
20200418:11:51:08 gpbackup:gpadmin:mdw:005876-[INFO]:-Writing data to file
Tables backed up: 28 / 28 [============================================================] 100.00% 8s
20200418:11:51:16 gpbackup:gpadmin:mdw:005876-[INFO]:-Skipped data backup of 2 external/foreign table(s).
20200418:11:51:16 gpbackup:gpadmin:mdw:005876-[INFO]:-See /home/gpadmin/gpAdminLogs/gpbackup_20200418.log for a complete list of skipped tables.
20200418:11:51:16 gpbackup:gpadmin:mdw:005876-[INFO]:-Data backup complete
20200418:11:51:18 gpbackup:gpadmin:mdw:005876-[INFO]:-Found neither /usr/local/greenplum-db/./bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20200418:11:51:18 gpbackup:gpadmin:mdw:005876-[INFO]:-Email containing gpbackup report /data1/master/gpseg-1/backups/20200418/20200418115105/gpbackup_20200418115105_report will not be sent
20200418:11:51:18 gpbackup:gpadmin:mdw:005876-[INFO]:-Backup completed successfully
Step 10: Verify Bucket Size and Files
under the dev folder in s3 bucket the below structure is creeated. The backup folder is in the format ‘../dev/backups/ddmmyyy/ddmmyyyyhhmmss/’.
$ aws s3 ls sppde-gpbackup-test/dev/
PRE backups/
$ aws s3 ls sppde-gpbackup-test/dev/backups/
PRE 20200418/
$ aws s3 ls sppde-gpbackup-test/dev/backups/20200418/
PRE 20200418112456/
PRE 20200418115105/ ----- This is the new incremental backup
$ aws s3 ls sppde-gpbackup-test/dev/ --recursive --human-readable --summarize
you should see below output summary at the end
Total Objects: 478
Total Size: 418.2 MiB
Verify each backup size – You see the incremental backup size is very small in this case. Only the deltas from the parent backup are stored to S3.
$ aws s3 ls sppde-gpbackup-test/dev/backups/20200418/20200418112456/ --recursive --human-readable --summarize
Total Objects: 137
Total Size: 255.5 MiB
$ aws s3 ls sppde-gpbackup-test/dev/backups/20200418/20200418115105/ --recursive --human-readable --summarize
Total Objects: 341
Total Size: 162.7 MiB