OLTP workload performance improvement in Greenplum 6

GDPB6, Product Updates

OLTP workload performance improvement in Greenplum 6

Greenplum 6 contains multiple optimizations for OLTP scenarios, greatly improving the performance of simple query, insert, delete, and update operations in high concurrent situations. These improvements include:

Updating the PostgreSQL kernel version to 9.4. This update brings a new set of features while also improving the overall performance of the system. For example, the introduction of lock optimizations such as fastpath can reduce lock contention overhead in high concurrent situations.
GP6 provides global deadlock detection to support concurrent update/delete operations for the same HEAP table.
Optimize global transaction and even avoid holding locks in read-only scenarios, thus reducing the delay in starting and ending transactions.

Based on these optimizations, Greenplum 6’s TPC-B performance is 60 times better than Greenplum 5 in our test environment, with a single update operation performance up to 70 times, single insertion peak performance up to 3.6 times, and single query performance up to 3.5 times. Especially for the single query scenario, we eliminated most of the lock competition in Greenplum 6, so that master CPU usage can exceed 90%, which can further improve the TPS performance of the query by improving the hardware performance of master node. In the 192-core test environment (1 master+18 segments), single query TPS can reach 220,000.

1 Test environment and method

1.1 Test environment

Our test environment is based on Google Cloud Platform (GCP). It is a cluster of 5 virtual hosts, including a master host and four segment hosts. The configuration information of the master and segment virtual hosts is as follows:

Each segment host runs one segment, and the entire cluster is not configured with mirror and standby. In addition to this, you need a virtual host to run the test tool pgbench, its configuration does not need to be very high, in our test is a 4-core 5 GB configuration.

1.2 Greenplum Information

Our test uses a self-compiled Greenplum, the following is the version information:

Greenplum 6	Greenplum 5
6.0.0-beta.3	5.18.0
d2ebd40835e481a8724f4e3169c60cade3fed6d8	6aec9959d367d46c6b4391eb9ffc82c735d20102
./configure \ –prefix=$HOME/opt/gpdb \ –disable-orca \ –disable-gpfdist \ –disable-pxf \ CFLAGS=’-g -O3 -march=native’ \ CXXFLAGS=’-g -O3 -march=native’	./configure \ –prefix=$HOME/opt/gpdb5 \ –disable-orca\ –disable-gpfdist \ –disable-pxf \ CFLAGS=’-g -O3 -march=native’ \ CXXFLAGS=’-g -O3 -march=native’

1.3 Cluster Configuration

gpconfig -c gp_enable_global_deadlock_detector -v on
- This GUC is used to control whether global deadlock detection is enabled. It is turned off by default in Greenplum 6. It needs to be turned on to support concurrent update/delete operations; Greenplum 5 does not support this GUC.
gpconfig -c log_statement -v none
- This GUC reduces unnecessary logs and prevents log output from interfering with I/O performance.
gpconfig -c checkpoint_segments -v 2 –skipvalidation
- This GUC affects the frequency of checkpoint disk flushing. The default value of 8 will reduce the flushing frequency, but the amount of data per flushing is large, resulting in a transient performance degradation of the entire cluster. Adjusting the value for OLTP workload appropriately will increase the frequency of the flush, but since the amount of data per flush is smaller, the average performance will be significantly improved; Greenplum 5 supports this GUC but has no obvious effect, because the performance bottleneck in Greenplum 5 is not in I/O, but in the serialization caused by table locks.

1.4 Test Method

Our test was carried out using pgbench. The data size was 1000 times, no additional adjustments were made to the data. There are four test categories, which are standard TPC-B transactions with multiple statements, single select statement, single update statement, and single insert statement. For each category, the test counts the TPS values when the concurrent client is increased from 10 to 200.

2 Test Results

2.1 TPC-B

Pgbench’s TPC-B test mixes insertion, update, and query operations of large and small tables. Greenplum 6 has a TPS peak of around 4,300, while the Greenplum 5 has a peak of around 70, a performance improvement of 60 times. One key to the huge performance difference is that Greenplum 6 introduces global deadlock detection to support concurrent updates to HEAP tables, and updates to the same table in Greenplum 5 must be done in the serialization .

— pgbench -c $N -j $N -r -T 60 -P 1 -s 1000
\set scale 1000
\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\set random aid 1 :naccounts
\set random bid 1 :nbranches
\set random tid 1 :ntellers
\set random delta -5000 5000

BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

2.2 Single Select

The TPS peak of Greenplum 6 can reach more than 79,000, while the TPS of Greenplum 5 is only 23,000, and the increase rate is 350%.

— pgbench -c $N -j $N -r -T 60 -P 1 -s 1000 -S
\set scale 1000
\set naccounts 100000 * :scale
\set random aid 1 :naccounts

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

2.3 Single Update

In addition to the built-in tests of pgbench, we also tested single update scenario. The peak TPS of Greenplum 6 is 7300, and the peak TPS of Greenplum 5 is about 100, which is more than 70 times.

— pgbench -c $N -j $N -r -T 60 -P 1 -s 1000 -f update-only.sql
\set scale 1000
\set naccounts 100000 * :scale
\set random aid 1 :naccounts
\set random delta -5000 5000

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

2.4 Single insert

In single insert test, Greenplum 6 has a TPS peak of more than 18,000, and Greenplum 5 has a TPS peak of 5,000, an increase of 360%.

— pgbench -c $N -j $N -r -T 60 -P 1 -s 1000 -f insert-only.sql
\set scale 1000
\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\set random aid 1 :naccounts
\set random bid 1 :nbranches
\set random tid 1 :ntellers
\set random delta -5000 5000

INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

3 Future work

In the test we noticed that some compilation options have a huge impact on the performance and stability of the update operation. wal_segment_size is a read-only GUC, which means the size of a single WAL slice file. A forced disk flush is triggered every time a number of slice files are written. Obviously, the larger the value, the lower the flush frequency. However, the amount of data per disk flush is also more, the I/O operation performance of other processes on the host is greatly disturbed, and the performance of the entire cluster is instantaneously degraded. In Greenplum, the default value of this GUC is 64MB. We noticed that TPS of update operation in Greenplum fluctuated greatly, and when it was adjusted to the default value of 16MB in PostgreSQL, the fluctuation range was significantly reduced. The TPS value has also improved. However, this value is only supported at compile time via “configure –with-wal-segsize=SEGSIZE”, so how to support runtime tuning and fine-tuning strategies will be a part of our future work.

We also noticed that in the single-insertion test category, when the number of concurrency of Greenplum 6 exceeds the peak value, its performance has a certain degree of decline. According to our preliminary test, this is related to the internal lock competition, if we can have more optimization, higher insertion performance can be expected in a future version of Greenplum.