backup-and-recovery-operations

Backup and Recovery Operations

The Greenplum Database parallel dump utility gpcrondump backs up the Greenplum master instance and each active segment instance at the same time.

By default, gpcrondump creates dump files in the gp_dump subdirectory.

Several dump files are created for the master, containing database information such as DDL statements, the Greenplum system catalog tables, and metadata files. gpcrondump creates one dump file for each segment, which contains commands to recreate the data on that segment.

You can perform full or incremental backups. To restore a database to its state when an incremental backup was made, you will need to restore the previous full backup and all subsequent incremental backups.

Backing up a Greenplum Database

Each file created for a backup begins with a 14-digit timestamp key that identifies the backup set the file belongs to.

gpcrondump can be run directly in a terminal on the master host, or you can add it to crontab on the master host to schedule regular backups.

The Greenplum Database parallel restore utility gpdbrestore takes the timestamp key generated by gpcrondump, validates the backup set, and restores the database objects and data into a distributed database in parallel. Parallel restore operations require a complete backup set created by gpcrondump, a full backup and any required incremental backups.

Restoring a Greenplum Database backup

The Greenplum Database gpdbrestore utility provides flexibility and verification options for use with the automated backup files produced by gpcrondump or with backup files moved from the Greenplum array to an alternate location.

Exercises

These exercises will walk through how to create a full backup of your database and then restore a table.

  1. To run a full backup, use “gpcrondump -x database -u /path/for/backup -a”. This will backup the entire database to the directory given without prompting the user.

    $ gpcrondump -x tutorial -u /tmp -a -r

    20151021:18:15:08:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting gpcrondump with args: -x tutorial -u /tmp -a -r
    20151021:18:15:09:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Directory /tmp/db_dumps/20151021 exists
    20151021:18:15:09:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked /tmp on master
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Configuring for single database dump
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Validating disk space
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding compression parameter
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Adding --no-expand-children
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=/tmp/db_dumps/20151021 --gp-r=/tmp/db_dumps/20151021 --gp-s=p --gp-k=20151021181509 --no-lock --gp-c --no-expand-children tutorial
    20151021:18:15:10:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting Dump process
    20151021:18:15:14:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Releasing pg_class lock
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump process returned exit code 0
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Timestamp key = 20151021181509
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Checked master status file and master dump file.
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump status report
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Target database                          = tutorial
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump subdirectory                        = 20151021
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump type                                = Full database
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Clear old dump directories               = Off
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump start time                          = 18:15:09
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump end time                            = 18:15:30
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Status                                   = COMPLETED
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump key                                 = 20151021181509
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump file compression                    = On
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Vacuum mode type                         = Off
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Exit code zero, no warnings generated
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[WARNING]:-Found neither /usr/local/greenplum-db/./bin/mail_contacts nor /home/gpadmin/mail_contacts
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[WARNING]:-Unable to send dump email notification
    20151021:18:15:30:068072 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-To enable email notification, create /usr/local/greenplum-db/./bin/mail_contacts or /home/gpadmin/mail_contacts containing required email addresses
    

    This runs a full backup of the database created during the previous exercises.

  2. To view the backups:

    ls -al /tmp/db_dumps

  3. Now, that we have a full backup let’s remove data from a table to simulate a failure.

    $ psql -U gpadmin tutorial
    tutorial=# select count(*) from otp_r;

    This should return 2049104 rows in the table. Let’s truncate the table and then check the row count:

    tutorial=# truncate table otp_r;
    tutorial=# select count(*) from otp_r;
    

    The report should now show 0 rows in the table.

  4. Let’s restore the data that was lost. First, exit from the psql shell by typing \q then issue the gpdbrestore command:

    $ gpdbrestore -T faa.otp_r -s tutorial -u /tmp -a

    20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Starting gpdbrestore with args: -T faa.otp_r -s tutorial -u /tmp -a
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Scanning Master host for latest dump file set for database tutorial
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Located dump file gp_cdatabase_1_1_20151021181325 for database tutorial, adding to list
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Located dump file gp_cdatabase_1_1_20151021181509 for database tutorial, adding to list
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Located dump file gp_cdatabase_1_1_20151021181404 for database tutorial, adding to list
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Located dump file gp_cdatabase_1_1_20151021181423 for database tutorial, adding to list
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Identified latest dump timestamp for tutorial as 20151021181509
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-------------------------------------------
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Greenplum database restore parameters
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-------------------------------------------
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Restore type               = Table Restore
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Database name              = tutorial
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-------------------------------------------
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Table restore list
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-------------------------------------------
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Table                      = faa.otp_r
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Restore method             = Specific table restore
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Restore timestamp          = 20151021181509
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Restore compressed dump    = On
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Restore global objects     = Off
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Array fault tolerance      = n
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-------------------------------------------
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Running metadata restore
20151021:18:21:14:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Invoking commandline: gp_restore -i -h gpdb-sandbox.localdomain -p 5432 -U gpadmin --gp-i --gp-k=20151021181509 --gp-l=p -s /tmp/db_dumps/20151021/gp_dump_1_1_20151021181509.gz -P --gp-r=/tmp/db_dumps/20151021 --status=/tmp/db_dumps/20151021 --gp-d=/tmp/db_dumps/20151021 --gp-f=/tmp/table_list_wXAV1W --gp-c -d tutorial
20151021:18:21:16:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Expanding parent partitions if any in table filter
20151021:18:21:17:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-gp_restore commandline: gp_restore -i -h gpdb-sandbox.localdomain -p 5432 -U gpadmin --gp-i --gp-k=20151021181509 --gp-l=p --gp-d=/tmp/db_dumps/20151021 --gp-r=/tmp/db_dumps/20151021 --status=/tmp/db_dumps/20151021 --gp-f=/tmp/table_list_BsvLQg --gp-c -d tutorial -a:
20151021:18:21:27:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Running post data restore
20151021:18:21:27:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-gp_restore commandline: gp_restore -i -h gpdb-sandbox.localdomain -p 5432 -U gpadmin --gp-d=/tmp/db_dumps/20151021 --gp-i --gp-k=20151021181509 --gp-l=p -P --gp-r=/tmp/db_dumps/20151021 --status=/tmp/db_dumps/20151021 --gp-f=/tmp/table_list_BsvLQg --gp-c -d tutorial:
20151021:18:21:29:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Commencing analyze of restored tables in 'tutorial' database, please wait
20151021:18:21:39:069135 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-'Analyze' of restored tables in 'tutorial' database completed without error
  1. Finally, verify the row count >$ psql -U gpadmin tutorial
    >tutorial=# select count(*) from otp_r;

The table should show 2049104 rows again as it was prior to the truncate call.