Conquering Your Database Workloads
Howard Goldberg – Executive Director, Morgan Stanley, Head of Greenplum engineering
1 Introduction
Everyone has been in some type of traffic delay, usually at the worst possible time. These traffic jams result from an unexpected accident, volume on the roadway, or lane closures forcing a merge from multiple lanes into a single lane. These congestion events lead to unpredictable travel times and frustrated motorists.
Databases also have traffic jams or periods when database activity outpaces the resources (CPU/Disk IO/Network) supporting it. These database logjams cause a cascade of events leading to poor response times and unhappy clients. To manage a database’s workload, Greenplum (4.3+) utilizes resource queues and the Greenplum Workload Manager (1.8+). Together these capabilities control the use of the critical database resources and allow databases to operate at maximum efficiency. This article will describe these workload manager capabilities and offer best practices where applicable.
2 Why is Workload Management important?
Without some form of resource management, a database processes all work requests with the same urgency. Knowledge about an application’s workload and performance service levels is required to provide key information to the database when classifying and prioritizing incoming activity. Often, many application workloads are running concurrently with conflicting service-level agreements (SLAs). If left unmanaged, the resources of the database will be used inefficiently and the overall performance of the system and user experience will be poor and unpredictable.
With the Big Data revolution in full swing, a single Greenplum database can be a petabyte (after compression) in size and require up to a hundred-plus hosts to support the projected analytic workloads. To maximize the total cost of ownership (TCO) of an environment of this size, multiple diverse use cases will execute simultaneously. For example, some users within the same Greenplum database could be executing long-running machine learning algorithms that leverage Greenplum’s in-database analytics capabilities, using R and Madlib. At the same time, other users could be submitting focused SQL queries with SLA response times in the seconds to minutes range. These use cases have different resource utilization characteristics and response time expectations. The consolidation of these use cases along with multi-tenancy and user concurrency requirements necessitates that a workload manager be used to optimize the resources and ensure that all user’s SLAs are satisfied.
3 What are the building blocks of a WLM strategy?
Similar to building a database schema using Data Definition Language (DDL), a workload management strategy or policy is created using database commands. The building blocks of a WLM strategy in Greenplum requires an administrator to use a combination of resource queues and WLM rules. A resource queue groups a set of users into a resource management group, providing some governing controls, and the Greenplum WLM rules define granular resource controls at a host, segment or pid scope level. Together these WLM constructs create a sophisticated and comprehensive resource governing and monitoring strategy.
3.1 Resource queues
Resource queues (RQ) are built into the core Greenplum database engine and are used to control the number of concurrent queries, memory used, and the CPU used by a query. By default, all users are assigned to the default pg_default resource queue, which has active statements set to 20 and its priority set to medium. It is a best practice to create new resource queues rather than change the properties of pg_default. This will provide more control and flexibility as the database users and activities change over time. Please note that the default max_resource_queues is nine and typically has to be increased to handle the array of resource queues required to manage most database workloads. Refer to Figure 1 for an example of a custom resource queue and how to assign it to a role.
3.2 What SQL is managed by a resource queue?
Not all SQL statements are controlled by resource queues. By default only SELECT, SELECT INTO, CREATE TABLE AS SELECT, and DECLARE CURSOR statements are evaluated. If the server configuration parameter resource_select_only is set to off, then INSERT, UPDATE, and DELETE statements will be evaluated as well. Also, an SQL statement that is run during the execution of an EXPLAIN ANALYZE command is excluded from resource queue controls. Finally, any user with superuser privilege such as “gpadmin” will not be governed by RQ restrictions.
The table (Figure 2) below lists the core properties associated with a resource queue.
Property Name | Type | Description |
active_statements | Integer | Limits the number of queries that can be executed by roles assigned to the resource queue. Either the active_statements or max_cost property must be set on each resource queue. Please note that in the latest release of Greenplum 4.3+, Pivotal recommends to avoid using the max_cost property. |
priority | Enumeration | Sets the relative priority of queries executed by roles assigned to the resource queue. The allowed values, in order of increasing priority, are MIN, LOW, MEDIUM, HIGH, and MAX.
Resource queues implement prioritization by leveraging the OS “usleep” function to add sleep times for sessions depending on the priority settings. The database engine does not stop any query from executing but rather slows down low priority queries. The priority control is only active when multiple queries are executing concurrently. GP does not use “nice” to control RQ priorities. |
max_cost | Float | Sets a maximum limit on the total cost of queries that can be executed by roles assigned to the resource queue. The cost of a query is estimated by the query planner and is measured in units of disk page fetches. Max Cost limit is cumulative and not just per query. This property will be deprecated by Pivotal in future releases. |
min_cost | Float | Sets the minimum estimated query cost for a query to be managed by the resource queue. Queries with estimated costs below this threshold are executed immediately. This property will be deprecated by Pivotal in future releases. |
cost_overcommit | Boolean | If a resource queue is limited based on MAX_COST, a query that exceeds the MAX_COST limit is allowed to execute if the system is idle and COST_OVERCOMMIT is true. If COST_OVERCOMMIT is set to false, queries that exceed MAX_COST are always rejected. |
memory_limit | Integer (kilobytes) | Sets the total amount of memory that all active statements submitted to the queue may consume. The minimum is 10240KB. There is no maximum, but when a query executes it is limited by the segment host’s physical memory. Set the parameter to -1 for no limit. |
3.3 Greenplum Workload Manager (WLM)
The Greenplum WLM is a set of four services (agent, cfgmon, rabbitmq, rulesengine) that run on each Greenplum master and segment server to manage database resources based on WLM rules. These services are started using the INSTALLDIR/gp-wlm/bin/svc-mgr.sh command and the startup of the Greenplum WLM services are not included as part of the core database engine startup or shutdown (gpstart/gpstop). To simplify the management of the Greenplum WLM and ensure that it is always up when the database is up, a wrapper shell can be created to start and stop both the database and WLM services together.
The resources managed by WLM rules are a superset of those managed by resource queues. The list of GP WLM threshold metrics (formally known as datums) is large and continues to expand with each new release. The complete list of threshold metrics can be viewed at https://gpcc.docs.pivotal.io/latest/gp-wlm/topics/metric-ref.html. The table below compares and suggests which resource management technique is optimal to manage a particular resource.
Resource queue to GP WLM resource management comparison (Figure 3)
Resource ame | Description | Resource Queues | GP WLM |
Active statements | The number of active statements allowed to execute | Yes
|
No
Active statements is not a threshold metric controllable via a GP WLM rule. |
Query cost | The estimated cost of a query | Yes | No |
Query Priority | The priority that a query gets CPU resources. | Yes
CPU throttling is implemented using a similar sleep mechanism as GP WLM but RQ priority is easier to conceptualize and implement. |
No
Enhancements to the GP WLM CPU controls will be introduced in GP 5.0 using resource groups. |
Memory | Memory utilization for processes and queries | Yes | Yes
Memory controls in GP WLM is far superior than RQ’s and its advisable to use GP WLM to manage this resource. |
Connections | Number of backend connections and connections to the master | No | Yes |
Identification | Names of users, hosts, databases, ports, processes. | No
|
Yes
Since you can associate users and roles to a RQ this is similar to GPWLM but WLM is more robust and easier to implement. |
Transactions | Information about the current transaction, queries within transactions, and numbers of transactions committed and rolled back in the database | No | Yes |
Date/Time | Date and time metrics for a host | No | Yes |
CPU | CPU utilization for hosts, processes, and sessions | No | Yes |
Spill | Number of spill files created and total spill file size for a query | No | Yes |
I/O | Disk read/write statistics for databases, processes, and queries | No | Yes |
Skew | Disk read/write skew and memory skew for queries | No | Yes |
A rule is the core construct used by GP WLM to manage database resources. A rule has five components and threshold metric components can be combined using Boolean operators to create compound rules.
Rule component table (Figure 4)
Component | Description | Rule example |
1. Rule name | Name of Rule | rule add kill_long
pg_terminate_backend() when session_id:host:pid:runtime > 120 |
2. Threshold action
|
Three actions (throttle, kill, record)
throttle_gpdb_query – specify a CPU utilization percentage limit for a pid or session _id associated with a query.
pg_terminate_backend – terminate a query and record its information in gp_wlm_events* table.
gpdb_record – record an event about a query in the gp_wlm_records* table.
*Please note that the gp_wlm_records table and gp_event_records table are created in the postgres database by default. It is advisable to create these tables, at WLM installation, in the gpperfmon database since most administrators grant access to this database to their sophisticated developers. It is on the Pivotal requirement list to merge the gp_wlm_records and gp_wlm_events table into a single table to simplify WLM information analysis. |
rule add kill_long
pg_terminate_backend() when session_id:host:pid:runtime > 120 |
3. Scope | A scope level and can be defined at the host, segment or pid level. | rule add kill_long
pg_terminate_backend() when session_id:host:pid:runtime > 120 |
4. Threshold or metric name
|
Refer to detailed located at https://gpcc.docs.pivotal.io/latest/gp-wlm/topics/metric-ref.html | rule add kill_long pg_terminate_backend()
when session_id:host:pid:runtime > 120 |
5. Threshold value
|
Threshold values can be compared using boolean operators (=,!=,>,<,>=,<=) or regular expression =~ pattern matching (/Select *./). | rule add kill_long pg_terminate_backend()
when session_id:host:pid:runtime > 120*
*note: 120 is in seconds
|
As a best practice, it is advisable to initially build rules using the “gpdb_record” threshold action to monitor the system and determine the best resource governing management threshold values for the system over time. Be aware that upon implementing the kill action GP WLM does not send a message to the user that a query was killed by GP WLM. However, the information stored and can be accessed from the gp_wlm_events external table. Notifying the users when activity is killed by GP WLM is on the GP WLM requirements list.
Listed below is a sample rules template to begin with and refine as your database utilization develops and advances. Kill actions will eventually have to be added to stop high resource consumers. Adjustments to rules can be done dynamically via a rule delete and rule add action. Please note that “gpadmin” ID has been explicitly excluded from these rules since most administrators want unconstrained access to the database.
Rule purpose | Rule definition |
Idle | Rule add idle_record
gpdb_record(message=”Record Idle for pgadmin3″) when (session_id:host:pid:current_query = ‘<IDLE>’ and session_id:host:pid:runtime > 14400 and session_id:host:pid:usename != ‘gpadmin’ and datid:numbackends > 75 and session_id:host:pid:application_name =~ /pgAdmin III/)
|
Idle in transaction | Rule add idle_intrans_record
gpdb_record(message=”Query status is <IDLE in Transaction> > hour “) when session_id:host:pid:runtime > 3600 and session_id:host:pid:current_query = ‘<IDLE in transaction>’
|
Workfile usage | Rule add workfile_record
gpdb_record(message=”10K or 500GB work files”) when session_id:host:pid:usename != ‘gpadmin’ and (session_id:host:pid:spillfile_count_across_cluster > 10000 or session_id:host:pid:spillfile_size_across_cluster > 500000000000)
|
Segment level memory | Rule add vmemseg_record gpdb_record(message=”Query running at > 50% of the segment vmem_protect_limit “)
when session_id:host:segment_id:vmem_size_pct > 50 and session_id:host:pid:runtime > 0
|
Find Select “*” queries running for > 10 minutes | Rule add query_sel_star_record gpdb_record(message=”Select * query running over 10 minutes”)
when session_id:host:pid:runtime > 600 and session_id:host:pid:usename != ‘gpadmin’ and session_id:host:pid:current_query =~ /select */
|
Find Select queries running for > 4 hours | Rule add query_4hour_record gpdb_record(message=”query running over 4 hours “)
when session_id:host:pid:runtime > 14400 and (session_id:host:pid:current_query != ‘<IDLE>’ and session_id:host:pid:current_query != ‘<IDLE> in transaction’) and session_id:host:pid:usename != ‘gpadmin’
|
Find processes using > 85% of CPU | Rule add cpu_record gpdb_record(message=’CPU util > 85% for this process’)
when host:pid:cpu_util > 85 and session_id:host:pid:runtime > 0
|
A very helpful threshold when building a WLM rule is the date/time threshold. Using this threshold in conjunction with other thresholds, a unique rule template can be deployed for a particular time of day, differentiating between online and batch workloads that have different workload signatures. For example,
Online users – From midnight to 5PM
- kill any query running for more than 120 seconds
rule add kill_long
pg_terminate_backend()
when session_id:host:pid:runtime* > 120
|
and host:hour > 0 and host:hour < 17
Batch process – From 5PM to midnight
- kill any query running for more than 600 seconds.
rule add kill_long
pg_terminate_backend()
when session_id:host:pid:runtime* > 600
and host:hour > 17 and host:hour < 24
WLM is an essential tool to manage the diverse and high volume use cases of a GreenPlum database. All users of the database feel that their jobs, reports, or SQL are mission critical. GP WLM provides the capability and discipline required to organize these activities and control the critical database resources so that they can be leveraged most effectively. In Greenplum 5.0, Pivotal sets the foundation for their next generation of WLM features, which will further enhance Greenplum’ s workload management capabilities.