Author | Wen Lin
Introduction
Greenplum is an open-source, massively parallel data warehouse designed for analytics and AI applications. Efficient data compression is vital in Greenplum to reduce storage space and improve query performance. Greenplum offers several techniques for compressing data, reducing storage costs, and improving query performance. In this blog post, we will explore some compression techniques in Greenplum with SQL examples.
Compression can be used for append-optimized tables in Greenplum database. There are two types of in-database compression available. Table-level compression is applied to an entire table. Column-level compression is applied to a specific column. You can apply different column-level compression algorithms to different columns.
Different table orientation support different compression algorithms:
Table Orientation | Available Compression Types | Supported Algorithms |
---|---|---|
Row | Table | ZLIB and ZSTD |
Column | Column and Table | RLE_TYPE, ZLIB and ZSTD |
Let’s use some examples to present it.
1. Create a table with row-oriented and table-level compression.
CREATE TABLE orders (order_id INT, quantity INT) WITH (APPENDOPTIMIZED=TRUE, COMPRESSTYPE=zlib);
This command creates a new ‘orders’ table where each Greenplum segment compresses appended rows using zlib compression method.
2. Create a table with column-oriented and table-level compression.
CREATE TABLE employees (id INT, name TEXT) WITH (APPENDOPTIMIZED=TRUE, orientation=column, COMPRESSTYPE=zstd);
3. Set compression level.
Greenplum supports different levels for different compression algorithms. The higher level has more compression, but the speed of compressing and uncompressing is not faster than lower level. ZTSD compression level can be set to values between 1 and 19. Compression level of ZLIB can be set to values from 1 to 9. Compression level of RLE can be set to values from 1 to 4.
CREATE TABLE countries (id INT, name TEXT) WITH (APPENDOPTIMIZED=TRUE, COMPRESSTYPE=zstd, COMPRESSLEVEL=5);
Here, we are applying ‘zstd’ compression on the ‘countries’ table with a compression level of 5.
4. Enabling compression on existing tables
ALTER TABLE orders SET (COMPRESSTYPE=zstd);
This command changes compression type on an existing table, the compression type of ‘orders’ is changed to ‘zstd’.
5. Disable compression on on existing tables
ALTER TABLE orders SET (COMPRESSTYPE=none);
This command disables compression on the table ‘orders’ by specifying ‘none’ for the COMPRESSTYPE option.
6. Create a column-oriented table with different compression types on columns.
CREATE TABLE T1 (c1 int ENCODING (compresstype=zstd), c2 char ENCODING (compresstype=zlib, blocksize=65536), c3 char) WITH (appendoptimized=true, orientation=column);
In this example, column c1 is compressed using zstd and uses the block size defined by the system. Column c2 is compressed with zlib, and uses a block size of 65536. Column c3 is not compressed and uses the block size defined by the system.
7. Verifying compression
Greenplum provides built-in functions to check the compression ratio of an append-optimized table. The functions take either the object ID or a table name.
SELECT get_ao_compression_ratio(’employees’);
This function helps to check the compression ratio for the specified table.
8. Compression in partitioned tables
Compression can also be used in partitioned tables.
CREATE TABLE sales (id int, date date, amt decimal(10,2))WITH (appendoptimized=true, orientation=column) DISTRIBUTED BY (id)PARTITION BY RANGE (date)(PARTITION s1 START (date ‘2016-01-01’) INCLUSIVE column date encoding(compresstype=zlib),PARTITION s2 START (date ‘2016-04-01’) INCLUSIVE column date encoding(compresstype=zstd),PARTITION s3 START (date ‘2016-7-01’) INCLUSIVE column date encoding(compresstype=RLE_TYPE),PARTITION s4 START (date ‘2016-10-01’) INCLUSIVE END (date ‘2017-01-01’) EXCLUSIVE column date encoding(compresstype=none));
This example creates an append-optimized, column-oriented table ‘sales’. ‘sales’ has four partitions, s1 ~ s4. The compression type of ‘date’ column of partition s1 ~ s4 are different.
We can also modify the compression type of a partition table.
ALTER TABLE sales_1_prt_s4 ALTER COLUMN date SET ENCODING (compresslevel=7, compresstype=zstd);
This example changes the compression type of the ‘date’ column in the partition table, which is s4 in sales, from none to zstd, the compress level is set to 7.
9. Utilities using compression
Some utilities of Greenplum Database support compression in order to accelerate data transfering or processing, like gpbackup, gprestore, gpcopy, gpload, and gplogfilter. For example, gpcopy command compresses data during transfer from the source to the destination database when copying data to a different host. If specified ‘–no-compression’ with gpcopy, data is transferred without compression.
10 .Data Loading with Compression
COPY products FROM ‘/home/gpadmin/products.csv’ WITH DELIMITER ‘,’ NULL as ‘NULL’;
While loading bulk data from a CSV file into a compressed table, it automatically follows the set compression method.
Above is some examples of using compression in Greenplum database, for more information about this technique, please refer to Enabling Compression