Wednesday, 14 February 2018

Hash Join Overflow Costing #4 - Temporary Space + PGA Use

Now I want to look at the value reported for "Temporary Space" (TempSpc) and how to check whether the hash table is close to fitting in memory within the PGA or not i.e. just a bit too big or way too big.

Temporary Space

It seems that this is simply the size of the hash table itself, which is in turn the size of the Build data set plus an additional 12 bytes per row. Which means that we can use part of the cost formula posted earlier for this to estimate the size in KiloBytes:
  • Build KBytes + (12 * Build Rows / 1000)
The evidence I have for this is:
  • The "TempSpc" value only changes when the Build size changes - never when the Probe size changes
  • When additional columns are added to Build data set in the SELECT the "TempSpc" value increases by the same amount as the Build size increases (for the same number of rows)
  • When the number of rows in the Build data set increases, the "TempSpc" value increases by an additional 11.76 bytes per additional row
    • Which is almost 12 bytes per row, as mentioned before as overhead in the hash table
You could of course "refine" the above formula and use "11.76" instead of "12" and that would give you a value for the hash table that is much closer to what is reported for the "TempSpc" in the execution plan. But I think that using 12 is good enough for only an estimate.

Hash Table Size vs PGA Size

I mentioned in the first post that the Optimizer uses the value of the internal initialization parameter "_smm_min_size" for the memory it can use as the work area for the hash table within the PGA. Thus if the size of the hash table is larger than this the Optimizer will cost the Hash Join as an overflow to disk operation. Using the formula posted above for the Temporary Space we can now estimate the size of the hash table needed and compare it to this minimum work area size value.

The notes I have on "_smm_min_size" say that it will be 0.1% of the PGA when the size of the PGA is greater than 128 MB, otherwise it will be 128 KB as a minimum. Assuming your PGA is larger than 128 MB then you can calculate what this is yourself. Note also that its value is reported in 10053 trace files along with all other initialization parameters, so you can directly verify its value if you want to.

At run time, though, the PGA may not be heavily used and may have a lot of unused memory available. An executing query can allocate and use more than the minimum work area value at run time for execution plan operations, up to a limit specified by the "_smm_max_size" initialization parameter. For PGAs larger than 1 GB this is set to be 10% of the PGA size. For smaller PGAs it is 100 MB, or 20% when the PGA is less than 500 MB.

Knowing how large the hash table needed will be, lets you compare it to the maximum possible work area size in your PGA and see what the likelihood is of the hash table being able to fit within the PGA at run time. As the ratio between the minimum and maximum possible work area sizes is 100 (0.1% to 10%) this is a large range and a hash table may have a reasonable chance of fitting in the PGA at run time, depending on how active your system is and the other queries running.

Equally it lets you consider what the benefit might be from increasing the size of the PGA significantly if you have a critical query involving a large hash join operation that is overflowing to disk. That said, there are some ways of rewriting queries that can minimise the size of the data being pulled through into the hash table used by the hash join, and so potentially get a smaller hash table to fit into memory in the PGA. But this depends on multiple factors, mainly being the number of columns of data coming through in the Build data set. If this is only a few columns then there won't anything that can be saved by such a query rewrite.

Tuesday, 6 February 2018

Hash Join Overflow Costing #3 - Simpler Formula

So far I have offered a formula for how a Hash Join that overflows to disk is costed and confirmed that this is only costed in terms of single block disk operations. While that formula produces very accurate results (less than 1% difference to the reported cost for the Hash Join operation) it requires you to obtain size information on each individual column being retrieved from both the Build and Probe data sets. And this could be quite tedious when there are many columns involved, or you don't have an easy way to work out the source columns involved in each data set. There is a simplification we can make to eliminate this detailed per column information and just use the information reported in the execution plan itself.

All of the columns being retrieved for the Build and Probe data sets are actually included within the "Bytes" value reported in the execution plan for each data set. And this value includes both the individual column data and any per-column overhead as well. What it does not include is the extra 12 bytes per row overhead needed in the hash table itself. We can approximate the size of this second part of the data by using the "Rows" value reported in the execution plan.

Examining the "Bytes" and "Rows" values in the execution plans for my test queries (see first post for details of these queries), I can see that the increase in the Hash Join cost is about 0.0485 per KB of increase in either data set (Build or Probe) for the same number of rows. Previously I determined that there was a 12 byte overhead within the hash table per data row expected.

This produces a revised Hash Join cost formula using only values from the execution plan of:
  • ( (Build KBytes + (12 * Build Rows / 1000) ) + (Probe KBytes + (12 * Probe Rows / 1000) ) ) * 0.0485
Note that the "Bytes" values used need to be in "K" units, whereas the "Rows" is not and so is divided by 1000. Checking this formula against the actual Hash Join costs of the test queries I ran I can see that it has an error of about 1% i.e. it is not as accurate as the previous formula, but is still accurate enough I believe.

Lets check how close this from one of the test queries. Here is the execution plan produced:
| Id  | Operation     | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |      |      |      |      | 1266 (100)|          |
|   1 |  SORT AGGREGATE     |      |    1 |   30 |      |           |          |
|*  2 |   HASH JOIN     |      | 50000 | 1464K| 1712K| 1266   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| HJ50K  | 50000 | 1123K|      |  365   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| HJ100K |  100K|  683K|      |  728   (1)| 00:00:01 |
We can see:
  • The Build and Probe data access cost is 365 + 728 = 1093
  • The cost of the Hash Join operation itself is 1266 - 1093 = 173
  • The calculated approximate cost is (1123 + (12 * 50) + 683 + (12 * 100)) * 0.0485
    • = (1123 + 600 + 683 + 1200) * 0.0485
    • = 3606 * 0.0485 
    • = 174.891
  • The difference is +1.891, which is 1.1% of the actual cost
This formula can therefore be a useful check against the cost reported when a Hash Join operation overflows to disk, and for determining which is the biggest cost factor i.e. the Build or the Probe data sets.