Pivotal Greenplum: Life in a Vacuum by Howard Goldberg
Vacuuming your home is a laborious task that you would rather not do. However, vacuuming your home is an essential chore that must be done. The same is true for vacuuming the catalog in a Pivotal Greenplum database (“Greenplum”). The proper maintenance and care is required for the Greenplum catalog to keep the database functioning at its peak efficiency.
Pivotal Greenplum is a massively parallel, share nothing, open source analytical data warehouse built on top of PostgreSQL for petabyte scale big data analytics use cases. When Greenplum was originally created about 10 years ago, it was forked off of PostgreSQL version 8.2 (now updated to version 8.3 with Greenplum v.5).
What is maintained in the Greenplum catalog?
The Greenplum catalog is the brains of the database. Information about all objects and their associated dependencies are stored in the catalog and every database command will query the catalog before executing. There are 75 tables in the catalog and a separate catalog per user database. All catalog tables are important, but the two most heavily accessed tables in the catalog are the pg_class (relation list) and pg_attribute (columns per relation) tables. When catalog tables become bloated (actual pages greater than expected pages) catalog queries can become sluggish and slow down database processing.
Why does a catalog need to be vacuumed?
Both Greenplum and PostgreSQL use a locking/concurrency model called Multi Version Concurrency Control (MVCC). With this model, writers do not block readers during update and delete operations. New versions of rows are created during data manipulation operations rather than physically deleting or updating in-place the original version of the row. The old versions of rows will continue to occupy physical space until a vacuum command (such as “vacuum pg_class”) is issued to free up the physical slots used by the old versioned rows. If a catalog vacuum is run at the appropriate time then newly inserted rows will reuse the space freed up by the last vacuum. If there are no free slots to be reused then new rows will be inserted at the end of the catalog table. The append process, which increases the size of the catalog table, is called bloating. When bloating becomes severe, a vacuum full is required. This type of vacuum operation will be discussed in more detail later in this article.
A vacuum recovers space used by updated or deleted rows. A vacuum knows what rows to remove by referring to a free space map (FSM), which tracks the locations of rows whose space can be released back to the OS. It is a best practice to create a script to vacuum/analyze the Greenplum catalog and regularly schedule it using either cron or some other job scheduler such as autosys. The frequency that this script is scheduled depends on your application’s usage pattern. At a minimum, it is recommended that you vacuum/analyze the catalog at least twice a day, seven days a week. A typical schedule is before the start of business processing and after your major ETL processing occurs. Additional iterations of the vacuum/analyze script should be added if excessive bloating is detected in the catalog. In addition, this script must be run for every database in the Greenplum instance including the Greenplum gpperfmon database.
Be aware that a vacuum can block update, delete, analyze and other vacuum operations while it’s running, but a catalog vacuum/analyze elapsed time is normally quite short in duration and should not impact ETL operations.
Furthermore, the most recent versions of Postgres 9.x have an automatic vacuum feature that will autonomously vacuum the catalog based on predefined thresholds. However, this option is not available in Greenplum versions 4.3.x or 5.0.x. It is on Pivotal’s radar and may be added in future Greenplum releases.
Catalog history table
A best practice in managing your Greenplum environment is to store catalog table information such as schema, table name, number of rows, number of pages, and size of table in a historical table. This will provide key metrics that can be used to determine when a catalog table increased in size or became bloated. Listed in figure 1 is a query and its associated output from a sample catalog_history table that shows the pg_atribute table jumping in size from 196GB to 247GB, potentially bloating the catalog. Knowing the database name and approximate timestamp when this table size change occurred can provide invaluable information to help optimize ETL operations and minimize bloating.
Figure 1 - Catalog history query example with abc as ( select database ,tstmp::date as dt ,extract(hour from tstmp) as hh ,to_char(tstmp::date, 'day') as day ,extract(hour from tstmp) ,relname ,reltuples/1024 as tuples_in_k ,sz_bytes ,bloat_diag_pages ,bloat_expected_pages from msgp.catalog_history where 1=1 and relname in ( 'pg_attribute') and database in ('xxxdb') and extract(month from tstmp::date ) = 8) select database, ,dt,relname,count(*) as cnt_hist_entries,pg_size_pretty(max(sz_bytes)::bigint) as max_sz_mb,max(bloat_diag_pages) as max_bloat_pages from abc group by 1,2,3 order by 2 desc, 3,4; Some rows have been deleted for display purposes. database | dt | relname | cnt_hist_entries | max_sz_mb | max_bloat_pages ----------+------------+--------------+------------------+-----------+----------------- xxxdb | 2016-08-25 | pg_attribute | 4 | 247 GB | 26447 xxxdb | 2016-08-24 | pg_attribute | 4 | 247 GB | 26447 xxxdb | 2016-08-23 | pg_attribute | 1 | 247 GB | 26447 xxxdb | 2016-08-16 | pg_attribute | 3 | 196 GB | 21011 xxxdb | 2016-08-15 | pg_attribute | 4 | 196 GB | 21011 xxxdb | 2016-08-14 | pg_attribute | 3 | 196 GB | 21011
What contributes to catalog bloat in GP?
Catalog bloat is caused when there are no free slots in the pages to be reused for new data rows. When this occurs, new rows are written to the end of the table, which increases the size of the table. A major contributor to catalog bloat comes from the creation and dropping of database DDL objects. Some examples of these catalog bloat use cases are listed below.
Gpload is a Greenplum fast ingest utility that loads data into a Greenplum database. The gpload utility creates an external table, a mapping of the file input to the relational table, and is defined using a YAML configuration file. Listed in figure 2 is an example of a YAML file.
By default, every time you load a table using the gpload command (gpload -U hjg -h hostname -p port_num -d xxxdb -f yaml_sample_hjg) an external table is created and dropped per gpload execution. This object creation and destruction can create catalog bloat in the pg_class and pg_attributes tables, when a gpload utility is executed. In addition, small, frequent loads in a micro-batching use case can further exacerbate the bloating situation due to the increased usage of external tables.
Figure 2 – Gpload YAML example VERSION: 18.104.22.168 DATABASE: xxxdb USER: hjg GPLOAD: INPUT: - ENCODING: LATIN9 - SOURCE: FILE: - /ms/user/h/hjg/tests/greenplum/GPLOAD/yaml.sample.data_hjg - COLUMNS: - data_source_dim_key: smallint - data_source: text - FORMAT: CSV - DELIMITER: E'\x1c' - NULL_AS: '' - QUOTE: '"' - ERROR_LIMIT: 20 - ERROR_TABLE: capo_staging.load_err_hjg SQL: - AFTER: grant all on capo_staging.load_err_hjg to public EXTERNAL: - SCHEMA: PUBLIC OUTPUT: - TABLE: capo_staging.ETL_data_source_dim_hjg - MODE: INSERT - MAPPING: data_source_dim_key: data_source_dim_key data_source: data_source
How bloat caused by Gpload be mitigated?
There are numerous ways to reduce catalog bloat related to external table creation and deletion. When loading the same table many times, use the “REUSE_TABLES: true” option in the YAML file. For example,
PRELOAD: REUSE_TABLES: true
For more details on gpload and its configuration file please refer to the following link: https://gpdb.docs.pivotal.io/4390/client_tool_guides/load/unix/gpload.html. To reduce external table usage in a micro-batching use case, decrease the frequency and increase the volume size of the gpload operations. Finally, to eliminate external table creation and deletion, gpfdist should be used instead of gpload so that the external table will be reused multiple times. With gpfdist, the external table is created once and used with an insert command such as “insert into target_table select * from source_file_ext”.
How do you find large external table usage?
To identify frequent creation of external tables, the pg_logs can be scanned using the grep command or the pg_logs can be queried via the Greenplum gp_toolkit.__gp_log_master_ext external table. For example,
$ egrep -ic 'create external' $MASTER_DATA_DIRECTORY/pg_log/gpdb-2017-04* | grep -v ':0' /var/gpdb/cluster1/datamaster/gpseg-1/pg_log/gpdb-2017-04-13_160544.csv:16538 > select logtime::date,count(*) from gp_toolkit.__gp_log_master_ext where logmessage ilike '%create external%' group by 1 order by 1;
How do you identify catalog bloat?
Greenplum provides a view named gp_toolkit.gp_bloat_diag to monitor table bloat and it is described in figure 3. Based on the value of the bddiag column, you can determine the degree of the bloating. If the table bloat is in the moderate to significant level, then a vacuum full is recommended. The issue with this view is that it does not consider the size of the table and its difficult to quantify what significant bloat actually means.
The query listed in figure 4 allows a user to define their own bloating thresholds and avoid false positives by reporting on small catalog tables. The accuracy of the view is dependent on up to date statistics, so frequent analysis of the entire catalog is required.
Figure 3 – gp_toolkit.gp_bloat_diag view bdirelid - Object ID of the table (pg_class.oid) bdinspname - table schema name bdirelname - table name bdirelpages – actual pages in the table data files bdiexppages – expected pages bdidiag - diagnosis of bloat ratio 1 to 3 -> no bloat 4 to 10 -> moderate bloat > 10 -> significant bloat
Figure 4 – gp_toolkit.gp_bloat_diag view Select bdinspname as schema ,bdirelname as table ,Bdirelpages as actual_pages ,Bdiexppages as expected pages ,(Bdirelpages / Bdiexppages)::int as act_exp_ratio From gp_toolkit.gp_bloat_diag Where bdinspname = ‘pg_catalog’ And pg_relation_size(bdirelname) > 1000 (1GB) And Bdirelpages / Bdiexppages > 3
The catalog tables have numerous B-tree indexes to optimize queries. Since the data in the catalog is constantly evolving, these indexes can also get bloated. A bloated index can adversely affect the performance of critical catalog queries when the query plan uses a table scan rather than an index lookup. The query listed in figure 5 can be used to identify bloated catalog indexes.
Reindexing the catalog’s indexes on a regular basis is a best practice. For example, “reindex pg_catalog.pg_class” will drop and recreate all of the indexes associated with the pg_calss table. This is a relatively quick operation, but an exclusive lock on the catalog table is required during the reindex. Therefore, a maintenance window for the reindex operation is recommended. A reindex does not alleviate the need for a vacuum full on the catalog, but it can delay the need for a vacuum full until a larger maintenance window can be arranged.
Figure 5 –index bloat query WITH i AS ( SELECT c.oid , c.relname , i.indrelid , pg_relation_size(c.oid) AS relsize FROM pg_catalog.pg_class AS c INNER JOIN pg_catalog.pg_namespace AS n ON c.relnamespace = n.oid AND n.nspname = 'pg_catalog' AND c.relkind = 'i' INNER JOIN pg_catalog.pg_index AS I ON c.oid = i.indexrelid ), c AS (SELECT c.oid , c.relname , pg_relation_size(c.oid) AS relsize FROM pg_catalog.pg_class AS c INNER JOIN pg_catalog.pg_namespace AS n ON c.relnamespace = n.oid AND n.nspname = 'pg_catalog' AND c.relkind = 'r' AND c.relisshared IS FALSE ) SELECT c.relname AS table_name , i.relname AS index_name , pg_size_pretty(c.relsize) AS table_size , pg_size_pretty(i.relsize) AS index_size FROM c INNER JOIN I ON c.oid = i.indrelid WHERE c.relsize != 0 AND i.relsize != 0 AND ((i.relsize/c.relsize::NUMERIC) > 2 AND i.relsize > 1073741824) OR (i.relsize > c.relsize AND i.relsize > 107374182400); table_name | index_name | table_size | index_size --------------+---------------------------------+------------+------------ pg_attribute | pg_attribute_relid_attnam_index | 9538 MB | 19 GB
What are the symptoms of bloat?
Catalog bloat does not stop queries from running or crash the Greenplum instance. Its effect is noticed by a gradual degradation in performance which is hard to detect unless you have baseline performance metrics to measure response times. The performance slowdown can be observed in the following areas:
- Slow catalog commands while in a psql session such as \dt, \dv or \d+ table name;
- Query prepare or compile time can elongate;
- Analyze utility elapsed times can elongate; and/or
- In a severe catalog bloating scenario, high CPU may be observed on the master host as catalog query access plans substituted a table scan for a quick index lookup. This scenario can be easily observed using the Greenplum Command Center.
It is a best practice to monitor and alert when catalog bloat crosses a predefined threshold such as actual / expected pages is greater than 2 and/or catalog index size is greater than 25% of the size of its base catalog table. The SQL in figures 4 and 5 can be used to collect the metrics required to create this alerting mechanism. The alerting thresholds should be managed via metadata or parameters so that the thresholds can be easily adjusted. To simplify this alerting process, focus on the most commonly bloated tables which are pg_class and pg_attribute.
No matter how good your catalog maintenance is, bloating is inevitable! The only way to remediate catalog bloat is by running a vacuum full on the catalog. A vacuum full has to be performed with extreme caution. It is recommended that the database be placed in restricted mode (gpstop – M fast/ gpstart –R) to avoid any user interaction while the vacuum full is running. Any interruption of the vacuum full process while it is running may cause catalog corruption and will require Pivotal to be contacted to verify or repair the catalog.
As a best practice, a regular maintenance window should be scheduled quarterly to vacuum full the catalog. A small to medium (<200GB) vacuum full should take approximately 15–30 minutes and a large (300GB+) vacuum full can take 45–90 minutes. Times may vary depending on how bloated and fragmented the catalog is. Based on past experience, the pg_attribute table typically takes the bulk of the time during the vacuum.
Vacuuming is certainly a chore whether it’s for your home or a Greenplum database. Being proactive in monitoring and maintaining the Greenplum catalog will keep your Greenplum environment operating at peak performance levels. Any catalog maintenance required can be scheduled within normal scheduled work windows rather than as unexpected emergencies. Hopefully, when Greenplum implements the autovacuum feature, vacuuming the catalog will be as autonomous as a Roomba in your living room.
Howard Goldberg – Executive Director
Head of Greenplum Engineering