Faster Optimization of Join Queries in ORCA

author:Hans Zeller

Optimizing joins is the core part of any query optimizer. It consists of picking a good join order, the right join algorithms (hash join, nested loop join, etc.) and various other things. The number of possible options grows extremely fast and requires a method called Dynamic Programming to keep the search effort for a good join plan in reasonable limits. However, even with Dynamic Programming, the optimization time grows exponentially with the number of joined tables.

For that reason, GPDB 6.13 with ORCA uses dynamic programming for up to 10-way joins. Beyond that, it employs a “greedy” optimization method, which is much faster but gives less optimal results.

In GPDB 6.14 we are introducing an updated method that should lead to faster optimization times and/or better execution plans for large joins.

Speeding up Enumeration

With an exponential growth in choices to consider, it is important that we spend as little time as possible on each choice, and this is one of the improvements in GPDB 6.14. (Please note that if you are using other versions of GPDB, you might still be able to benefit from the feature we are discussing, see details below).

ORCA is what’s called a top-down driven optimizer, unlike the more conventional bottom-up optimizers. For just the task of enumerating joins, however, we borrowed a bottom-up method that is more efficient for that part of query optimization. We still keep all the advantages of the top-down approach, like being able to integrate other operators like group by and window function into the optimization process.

Including Outer Joins

You might have noticed that large queries with outer joins took a bit longer to optimize. That’s because they use a different approach that is a bit more compute-intensive than that of inner joins. Starting in GPDB 6.14, inner and outer joins share the new, more efficient code path. This also means that the limit of 10 joined tables for exhaustive enumeration now applies to tables in inner and outer joins combined. We’ll discuss this in an example below.

Dynamic Partition Elimination

Many use cases rely heavily on dynamic partition elimination, where a join with another table is used to eliminate unneeded partitions.  For that to happen, we need to generate a join order that allows dynamic partition elimination (DPE) to take place. While the optimizer has always produced such plans, we now place additional emphasis on generating join orders that are suitable for DPE. Here is an example of a plan with DPE, you can see the partition selection is done on the other side of the hash join:

# explain (costs off)
# select *
# from sales join date_dim on sales.sale_date_id = date_dim.date_id
# where date_dim.d_year = 2020 and date_dim.d_month = 12;
                                  QUERY PLAN
--------------------------------------------------------------------
 Gather Motion 3:1
 ->  Hash Join
     Hash Cond: (sales.sale_date_id = date_dim.date_id)
   ->  Dynamic Seq Scan on sales (dynamic scan id: 1)
   ->  Hash
     ->  Partition Selector for sales (dynamic scan id: 1)
       ->  Broadcast Motion 3:3
         ->  Seq Scan on date_dim
             Filter: ((d_year = 2020) AND (d_month = 12))

Examples

Example 1: For smaller queries, let’s say 5-way joins, you should see little difference in behavior. But, let’s say we have two queries, a 10-way inner join and an 11-way inner join. In 6.13, the optimization time for the 11-way join will be significantly shorter than that of the 10-way join, but the plan quality will drop as well. In 6.14, this transition from an exhaustive enumeration to a greedy approach has been spread out and made smoother. Instead of the big jump at 10 tables you should see a gradual increase in optimization time and a gradual decrease in plan quality. Note that this decrease in plan quality is expected with any query optimizer.

Example 2: Consider a query with 7 inner join and 7 outer join operators (a total of 15 tables). In GPDB 6.13, the 8-way inner join lies below the threshold of 10 tables, so we perform an exhaustive enumeration, but the additional 7 tables with outer joins will take significant additional time, so this query will be relatively slow to optimize. In GPDB 6.14, we look at the combined inner and outer joins (a 15-way join) and don’t do a full enumeration. Therefore, this query should take significantly less time to optimize, but its plan quality might degrade slightly, very similar to what you would see for a 15-way inner join. In most cases, the gained planning time should outweigh the increase in execution time that a less optimal plan might cause.

Influencing Join Enumeration

While the default join enumeration algorithm should work fine in the vast majority of cases, there might be situations where you require a different trade-off between optimization and execution time. For that situation, you have some configuration parameters available.

The optimizer_join_order parameter allows you to specify the join enumeration algorithm. See the documentation for details, we’ll just discuss three possible values of this parameter briefly:

  • “greedy” means a greedy search that is fast, but leads to suboptimal plans.
  • “exhaustive”, the default in 6.13 and earlier, is the slower exhaustive method for up to 10-way inner joins and a greedy method beyond that.
  • “exhaustive2”, the default in 6.14, is the new method we describe in this blog. Please note that all that really changes in 6.14 is the default value. You can try this new method starting in versions 6.11.1 and in GPDB 5 starting with 5.28.2. Earlier versions might have the “exhaustive2” parameter setting as well but should be considered experimental.

So far we said that the threshold for exhaustive enumeration is a 10-way join, but that threshold can also be changed, by setting the optimizer_join_order_threshold parameter to a value other than 10. This parameter applies to the hard limit for “exhaustive” as well as to the gradual transition for “exhaustive2”.

Performance Measurement

We took the example above and tried a series of queries with approximately the same number of inner and outer joins. The chart below shows how long it took to optimize these queries in ORCA, using the old (exhaustive) and new (exhaustive2) algorithm.

Note the exponential rise in time for the old algorithm, up to 10 inner joins (20-way join total), followed by a drop when we switch to the greedy algorithm. The new algorithm also rises exponentially at first, but is overall more efficient.

This is a somewhat extreme example. When we use only inner joins the difference between the two approaches is much smaller, but you can see another difference: The new algorithm explores a greater search space for larger joins, leading to longer optimization but hopefully to faster query execution times. Note that the vertical scale of this second diagram is 1/10th of the first one.