Wednesday, 28 March 2018

Hash Join Overflow to Disk Summary

[I'm a bit busy at the moment working with a new client on a large and complex database, so I'm afraid I've only got time to do a quick post. I have a new post on a different topic planned, but am struggling with the time needed to write it up properly, so a short post is all I can squeeze in]

This is just a summary of the key results from my findings on how Oracle is costing a Hash Join when it overflows to disk, pulling all the conclusions together into one place.

In the first post I arrived at the following formula:
  • ( ((Build Columns Size + 12) * Build Row Count) + ((Probe Columns Size + 12) * Probe Row Count) ) * 0.0475
Where the "Columns Size" is the sum of the hash table storage for each column i.e. column data storage + 2 bytes per column, and the "Row Count" values are per 1,000 rows.

In the second post I showed that the costing was unaffected by the value of the Multi-Block Read Count parameter i.e. the costing was done only using single block disk I/O's.

In the third post I proposed a simpler formula using only values shown in an execution plan of:
  • ( (Build KBytes + (12 * Build Rows / 1000) ) + (Probe KBytes + (12 * Probe Rows / 1000) ) ) * 0.0485
This is only slightly less accurate than the original formula, but is much simpler to use as it only needs values reported in the execution plan, and no other calculations or derived values by you.

In the fourth post I showed that the reported size of the Temporary Space (TempSpc) in the execution plan is actually the size of the hash table itself, being dependent only on the size of the Build data set. And that knowing this hash table size will help you in deciding how much larger it is than the available memory in the PGA used to store such hash tables.