tag:blogger.com,1999:blog-73029563290084950232024-02-08T14:50:26.756+00:00Databases and PerformanceAll about Performance of Oracle and other relational databases on Linux and UNIX based systems. How to measure it, analyse it, tune it, and manage it over time.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.comBlogger86125tag:blogger.com,1999:blog-7302956329008495023.post-38786306351372258042019-01-19T12:23:00.000+00:002019-01-19T12:23:29.794+00:00Time SQL Execution with Python<a class="externalLink" href="https://databaseperformance.blogspot.com/2012/07/python-for-dba-1.html" target="_blank" title="External link to https://databaseperformance.blogspot.com/2012/07/python-for-dba-1.html">I've said before in this blog</a> how I find <a class="externalLink" href="https://www.python.org/" target="_blank" title="External link to https://www.python.org/">Python</a>
to be very useful for doing various things, including processing data
to or from an Oracle database. Here is another example where a
relatively simple and straightforward piece of Python code can deliver
something that is very useful - in this case measuring the elapsed time
of SQL queries executed on an Oracle database. We often need to execute
a given SQL query and see how long it takes to complete as part of a
tuning or monitoring exercise, but running it via SQL*Plus or something
else can tie up a session for an extended period of time and produce
large amounts of output that are spooled to the screen. Conversely we
don't want to run the SQL query and stop when only the first data row is
returned, as that is not a "full" execution of the query. We can use
Python to achieve this - execute a query to completion, consuming all
the data it produces but not outputting any of it, and measure the
elapsed time. And we can execute this outside of any SQL*Plus or SQL
Developer or other window we might be using.<br /><br />The key requirements for this utility are:<br />
<ul>
<li>Connect to an Oracle database as a specified user</li>
<li>Execute
the given SQL query, unmodified, so the execution plan used should
match that when the same SQL is submitted by the real application</li>
<li>Consume all the rows of data produced, for a true time to complete measurement</li>
</ul>
This covers most scenarios but does leave a few out - no bind variables can be used, for instance. <br /><br />There are three parts to achieving this really <br /><ul>
<li>Connect to the Oracle database using the supplied credentials</li>
<li>Read in the SQL query text from a file</li>
<li>Execute against an Oracle database connection and consume the results noting the start and end time of this</li>
</ul>
And
each of these parts involves relatively few lines of Python code to
achieve what it does. And part of the reason for this is Python's
dynamic data typing of variables - you don't have to declare the data
types of the data the SQL query produces in advance of retrieving it.
Which means that it can handle almost any query producing any number of
values per output row.<br /><br />The code I will now present is not
"perfect" in an absolute sense - there is very limited error handling
for instance - but it is complete and does work. I've used it many
times to execute a SQL query pulled out from an application that is
performing poorly, to verify its execution and the impact of any changes
made to try and improve performance. <br /><br /><h3>
Python Code</h3>
<strong>Connecting to an Oracle database</strong><br />I use <a class="externalLink" href="https://cx-oracle.readthedocs.io/en/latest/" target="_blank" title="External link to https://cx-oracle.readthedocs.io/en/latest/">cx_Oracle</a>
as the database access module for Oracle, and connection is relatively
simple. This assumes that the database credentials are in variables
named "<code>user_name</code>", "<code>user_password</code>" and "<code>database_name</code>".<br /><pre>import cx_Oracle
# Other code here, such as parsing command line arguments to get values into necessary variables
db_conn = cx_Oracle.connect (user_name, user_password, database_name)
</pre>
<br /><strong>Reading in the SQL query from a file</strong><br />Assume that the name of the file containing the query is in the string variable "<code>SQL_File_Name</code>".<br /><pre>sql_query = open (SQL_File_Name, "r").read ().rstrip (' ;\n') # Strip any semi-colon off from the end
</pre>
This does several things in one combined statement:<br /><ul>
<li>Opens the file for reading</li>
<li>Reads in the content of the file returning it as a single string</li>
<li>Strips off certain trailing characters at the end of the string - space, semi-colon or new line character.</li>
</ul>
We now have in the "<code>sql_query</code>"
variable the text string of the SQL query with no trailing semi-colon
(if there was one). This is now ready for use to execute against an
Oracle database.<br /><br /><strong>Executing the query and timing it</strong><br />For now assume that a database connection has been made and is referenced by the object named "<code>db_conn</code>".<br /><br />To
execute the SQL query we first need to get a cursor from the database
connection, then execute the SQL using it, retrieve all the rows of
data, and report the elapsed time:<br /><pre>import time
try :
curs = db_conn.cursor ()
start_time = time.time ()
curs.execute (sql_query)
# Now retrieve the rows of data produced by the query
row_count = 0
rows = curs.fetchmany (1000)
# Empty list will be False so can test it directly
while rows :
row_count += len (rows)
rows = curs.fetchmany (1000)
end_time = time.time ()
print ("Query Complete - Elapsed {0:4.1f}s for {1} rows".format ((end_time - start_time), row_count))
except Exception as ex :
logging.critical ("ERROR - Exception raised during processing")
logging.critical ("Exception - %s" % (ex))
</pre>
I've included exception handling in case there is something wrong
with the SQL query or its execution, but there is not a lot we can do
about it so it just prints out some information about the exception it
caught.<br /><br /><strong>More details</strong><br />I actually allow all the necessary values to be passed in from the command line to this utility program, and I use the Python "<a class="externalLink" href="https://docs.python.org/3/library/argparse.html" target="_blank" title="External link to https://docs.python.org/3/library/argparse.html">argparse</a>" module to process these and map them to corresponding variables. So you can either assign the results from "<code>argparse</code>" to the discrete variables I have used in the code examples, or instead replace those references by the results produced by "<code>argparse</code>". The results will be the same either way.<br /><br />The following code can be placed before the code given so far in the final Python program:<br /><pre>import argparse
import sys
# Parse command line arguments
parser = argparse.ArgumentParser (description="Time execution of a SQL query")
# Add command line argument definitions one by one into the parser
parser.add_argument ("-u", "--user", help="User name", required=False, default=None)
parser.add_argument ("-p", "--password", help="Password", required=False, default=None)
parser.add_argument ("-d", "--database", help="Database name", required=False, default=None)
parser.add_argument ("SQL_File", help="Name of SQL File to execute", default=None)
args = parser.parse_args ()
# First check whether we did get a user name, password etc on the command line
# If not, prompt the user for what we are missing
if args.database is None :
print ("Enter database", end=": ", flush=True)
database_name = sys.stdin.readline ().rstrip ()
else :
database_name = args.database
if args.user is None :
print ("Enter username", end=": ", flush=True)
user_name = sys.stdin.readline ().rstrip ()
else :
user_name = args.user
if args.password is None :
print ("Enter password", end=": ", flush=True)
user_password = sys.stdin.readline ().rstrip ()
else :
user_password = args.password
SQL_File_Name = args.SQL_File
# Get Database Connection ...
# Execute SQL and consume results ...
</pre>
<h2>
Summary</h2>
This relatively short and straightforward Python
program gives me a utility to time the execution of any SQL query
against an Oracle database, including consuming all of the data rows the
query produces. For me this has many advantages over other ways of
executing SQL queries when investigating performance problems, and is
something I use whenever needed. One advantage is that because it is
executing the original SQL query with no modifications, it will be using
the same execution plan as when that SQL query is executed from an
application. Which in turn means that if I do modify the SQL in some
way and manage to "improve" its performance, the new execution plan
should be the same one used when the same improved SQL is executed from
the application code.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com2tag:blogger.com,1999:blog-7302956329008495023.post-70267795688363413362018-06-25T14:34:00.000+01:002018-06-25T14:34:33.175+01:00Interactive Complex SQL Scripts - A SolutionOften you will find that you need to run a relatively simple SQL query
and see the results from it. Normally your choice of how to run this
SQL query in Oracle is a straight choice between either using a plain
SQL script executed through SQL*Plus, or using PL/SQL (probably in an
anonymous block). But often things are not quite that simple or
straightforward for you, and maybe there is an extra little bit of
complication in how that SQL needs to be executed. Maybe you need to
execute it repeatedly with a series of different values to use in a
filter. Maybe you need to ask the user to enter a value, or choose a
filter to use. Maybe you want to loop through the results of one query
and run a second query for each of these. Maybe you need some simple
conditional logic on the results and use that to choose which second
query to run. Maybe you need to provide progress feedback to the user
as a series of queries are executed, or as the results are retrieved of
each query.<br /><br />Unfortunately both SQL*Plus and PL/SQL each have
drawbacks that stop them from being the "right" solution for all
possible SQL query scenarios. I believe that I have identified a better
solution for such cases where the query is straightforward enough and
the conditional logic is also straightforward - use Python to execute
the necessary SQL queries instead. It will do what you want, it can do
much more than either SQL*Plus or PL/SQL can do themselves, and lets you
be consistent in how you run all such non-simple SQL queries. I'll
explain more of the details why I believe it to be a better fit for
running ad-hoc complex SQL queries in a moment.<br /><br />SQL*Plus or
PL/SQL each have their own advantages for running different types of SQL
query, but each also has a set of clear disadvantages that make them
unsuitable for some types of SQL query.<br /><br />SQL*Plus has its own set
of features that make it quite flexible for interactively running SQL
queries and seeing the results. These include things like "substitution
variables" and prompting the user to enter values, and controlling
formatting of the results output. It can execute any valid SQL
statement, including DDL to do things like create new database objects. I
find SQL*Plus very capable for writing an interactive script that can
show the user some data, prompt them to enter a value, run other queries
using that value as a filter, and display each result of each query as
it gets them. <br /><br />However, SQL*Plus has no direct "if then else"
conditional logic support. People may find one way or another to
achieve this, but it is not intrinsically a native feature of SQL*Plus,
and such workarounds can look like kludges and be difficult to read and
understand. It also has very limited error detection and handling
abilities - basically it can stop if anything goes wrong, but that's
about it. So there are clear limits to the abilities of SQL*Plus for
executing SQL queries with any kind of conditional logic between them.<br /><br />PL/SQL
is Oracle's procedural language extensions to wrap such conditional
logic and more around the execution of SQL queries, with good error
detection and handling capabilities. It is definitely the language to
use for things like stored procedures and triggers stored in the
database itself. However as a general purpose language for executing
SQL queries it has a number of severe limitations. The main one you
keep running into is that it has little to no real input or output
abilities for user interaction. It has no user input capabilities
because it is designed to run <em><strong>within</strong></em> the
database instance itself on the database server system, for which the
concept of a real user at a screen with a keyboard does not make sense.
So no real fault on PL/SQL's part for that. It has a very limited user
output capability (<code>DBMS_OUTPUT.PUT_LINE</code>), because none of
the output is visible to the user while the PL/SQL block is running.
Instead all the output appears all together at once at the end when the
PL/SQL has completed. Which is of course too late by then for progress
and status reports.<br /><br />PL/SQL also only supports DML SQL statements
as native SQL statements in its code, partially because all native
inline SQL statements are validated for correctness when the PL/SQL
block is initially compiled i.e. when it is all parsed when submitted
for creation or execution, and DDL is generally impossible to properly
validate for various reasons. The workaround is to put such DDL SQL
statements into local string variables in PL/SQL and then perform an "<code>EXECUTE IMMEDIATE</code>" on that string. While this does work, there are two major disadvantages:<br />
<ol>
<li>You
lose the validation of the SQL at compile time when the PL/SQL is
submitted, and instead can get a run time error when that particular SQL
DDL is executed</li>
<li>Your SQL DDL statements now become quoted
strings rather than plain, inline SQL statements, making reading them
more difficult (for two related reasons)<ul>
<li>Recognising such SQL statements when embedded within string variables when reading the PL/SQL code</li>
<li>Complications
when the SQL is very long and gets split over many separate lines, and
has "variable" parts within it dependent on the values in other
variables</li>
</ul>
</li>
</ol>
A solution that I think deals with all the
negatives of both SQL*Plus and PL/SQL while also supporting execution of
any SQL statement is to use Python. I believe that it does not suffer
from any of the problems that SQL*Plus or PL/SQL suffer from
individually (i.e. it successfully deals with those particular issues
and they are no longer problems in Python), while being a modern and
relatively cleanly designed programming language. Thus you can use one
programming language all the time for such one off complex SQL query
scenarios, providing more consistency for you when doing these.<br /><br />About
the only negative I can think of for using Python is that SQL
statements have to be put into string variables in the Python code and
then passed to an "execute" method (a library function) to run it
against the connected database. Which means that you do not get compile
time validation of the SQL itself that PL/SQL might give you, and
instead will only get run time errors when it tries to execute each SQL
query in turn. But PL/SQL suffers from the same problem for DDL SQL and
certain other variations of DML SQL (such as a variable list of column
names in the <code>SELECT</code>), so you are no worse off really by using Python.<br /><br />But
in all other areas I believe it is superior to SQL*Plus and PL/SQL, and
so worth the trade-off for what you gain (which is quite a lot) versus
what you lose (SQL validation at compile time). Furthermore, if you use
Python for all such SQL scripts you need, then you get a more
consistent library of your own scripts to use. Often I start writing
something in straight SQL + SQL*Plus and run into a dead end because of
something like conditional logic. So then I have to rewrite it all as an
anonymous PL/SQL block and add in a bunch of PL/SQL variable
declarations. Using Python all the time means you are only using one
language all the time, not two.<br /><br />Note that I am not saying to use Python for <em><strong>all</strong></em>
SQL executed against your Oracle database. Clearly things like
business rules and logic should be in stored procedures in the database
written in PL/SQL. And very simple SQL queries can be executed directly
using SQL*Plus, I'm on about one off sets of non-simple SQL queries
you want to run, or a report you want to be very interactive with user
prompts for data values and feedback as the script progresses. In these
cases I believe it is better to use Python (or maybe another
programming language with SQL database execution support) than a mix of
SQL*Plus and PL/SQL in different scripts.<br /><br />Having proposed Python
as the solution to the failings in both SQL*Plus and PL/SQL let me also
add one more reason to consider using it. It is relatively easy to come
up with a small set of templates for query execution that satisfy most
possible scenarios. Then it is a matter of copying the appropriate
template you have, editing that to add in the specific SQL queries you
want executed, and then running that Python program. And a lot of this
ability and power of Python is because it is a dynamic language - you do
not need to declare variables in advance of using them in your code.
This not only reduces the total number of lines of code needed for
anything, but it also simplifies many of the issues of having to
formally declare a named data type for a variable. Python will normally
work out the data type needed from the source data itself that is being
stored into the variable, so the <em><strong>same identical Python code can often be reused to process completely different source data types</strong></em>,
which is a very powerful feature. Changing a data type in PL/SQL can
involve a lot of changes to your source code. And this feature of
Python becomes more powerful the more you become familiar with it and
learn how to leverage it into truly generic code.<br /><br /><h2>
Python Examples</h2>
Here
I will provide some Python code snippets that form a template and can
be put together to run a SQL query for you. Note that I'm using Python 3
syntax which has a few key differences to Python 2 syntax e.g. "<code>print</code>"
is a true function in Python 3. Also note that in Python the
indentation level of statements is key, and tells Python when nested
blocks of statements start and end. Indentation is not just about a
pretty layout - it also tells Python how to interpret the meaning of
your source code. And remember that variables do not have to be defined
before you use them - Python determines the actual data type of a
variable at run time when a value is stored in it.<br /><br />First you need
to connect to the database in question, and it may be easier to do this
by prompting the user to enter their credentials rather than putting
them on the command line when invoking Python. <br /><pre>#!/usr/bin/env python
#
import sys
import time
import cx_Oracle
print ("Enter database", end=": ", flush=True)
database = sys.stdin.readline ().rstrip ()
print ("Enter username", end=": ", flush=True)
username = sys.stdin.readline ().rstrip ()
print ("Enter password", end=": ", flush=True)
password = sys.stdin.readline ().rstrip ()
</pre>
Note how the "<code>sys.stdin.readline()</code>" method is used to read from standard input what the user has typed. And how the string object method "<code>rstrip()</code>" is immediately used to strip off the newline character that is included in what "<code>readline()</code>" returns but which we don't need.<br /><br />Personally
I like to run a trivial query against the database immediately to show
that the connection was successful and the database is up. In this
example it queries both <code>V$DATABASE</code> for the database name, and then <code>V$VERSION</code> for the database software version.<br /><pre>print ()
print ("Connecting ...")
conn = cx_Oracle.connect (username, password, database)
print ("Connected successfully")
curs = conn.cursor ()
# V$DATABASE has only 1 row in it
curs.execute ("select name from v$database")
rows = curs.fetchall ()
print ("Connected to database {0}".format (rows [0][0]))
# Just the one row from V$VERSION
curs.execute ("select banner from v$version where banner like 'Oracle Database%'")
rows = curs.fetchall ()
print (rows[0][0])
print ()
</pre>
Note how Python can retrieve all of the results at once into a
nested set of arrays (lists as Python calls them) - a list of rows, and
each row is a list of columns. <br /><br />Also note that I am using the
"cx_Oracle" module as the database driver for connecting to Oracle.
Python defines a common database API that can be implemented by
different driver modules for connecting to different databases. The
"cx_Oracle" module is a third party module commonly used for connecting
to Oracle databases. You can download a copy of "cx_Oracle" or another
driver module and add it to your Python installation.<br /><pre># Now do any real work we want to do, using the database connection just obtained
#
sql_query = """
select banner
from v$version
"""
print ("Executing SQL Query")
curs.execute (sql_query)
rows = curs.fetchall ()
for i in range (len (rows)):
print (rows[i][0])
</pre>
The query here is just another trivial query against <code>V$VERSION</code>, but it could be anything you want it to be. It is very simple to iterate over all of the rows retrieved by a query in a "<code>for</code>" loop (<code>len</code>
tells you the number of members in a list) and then print out the data
retrieved. This is one way to iterate over the results by using "<code>fetchall ()</code>"
to fetch all of the rows back explicitly, but there are a few other
variations on this you can use depending on what you want to achieve and
your preferences e.g. you can also iterate directly over the cursor
itself which gives you each row of data values in turn.<br /><br />Note how
Python supports two different variations on defining string literals.
Short string literals can be put inline between pairs of matching quotes
- both single and double quotation characters are supported - and there
are multiple cases of this here. Long string literals can be put
between matching pairs of triple quotation characters and split over
multiple lines, which is how I have done the SQL query here. The new
line characters present in your source code are retained within the
string value i.e. they are not stripped out when Python parses such a
string literal value.<br /><br />To execute a query in a loop with different
filter values you should use bind variables rather than string
concatenation (which is prone to SQL injection issues). In "cx_Oracle"
we can use a leading colon in front of the bind variable name in the SQL
query.<br /><pre>sql_query = """
select count (*) cnt
from sales_history
where period_id = :period_id
GROUP BY product_id,
customer_id,
territory_id,
distributor_id,
period_id,
sale_type,
currency_code
having count (*) > 1
"""
#
for pid in range (1, 100) :
# Get list of duplicate counts, then count those separately and output
curs = conn.cursor ()
bind_vars = {'period_id' : pid} # A dictionary of bind variable name to value mapping
curs.execute (sql_query, bind_vars)
rows = curs.fetchall ()
dup_count = len (rows)
print ("[{0}] Period {1:d} has {2:d} duplicates".format (time.strftime ("%H:%M:%S"), pid, dup_count))
</pre>
Note the use of the "<code>range</code>" function which returns a list of values from the first value to one less than the last value.<br /><br />Lets
say you don't know the list of "period id" values to iterate over (1 to
99 in previous example), and instead want to get those from another
query. You can do that too:<br /><pre># Calendar contains per day entries, but month ids are not contiguous
# So Select the distinct month calendar id ordered descending and iterate over those
monthcurs = conn.cursor ()
monthcurs.execute ("select distinct month_calendar_id from calendar order by 1 desc")
for monthrow in monthcurs :
cmid = monthrow [0] # First and only column in each output row
# Get list of duplicate counts, then count those separately and output
curs = conn.cursor ()
bind_vars = {'period_id' : cmid}
curs.execute (sql_query, bind_vars)
rows = curs.fetchall ()
dup_count = len (rows)
print ("[{0}] Month ID {1} has {2:d} duplicates".format (time.strftime ("%H:%M:%S"), cmid, dup_count))
</pre>
The above shows how Python lets you directly iterate over the rows in the first result set (<code>for monthrow in monthcurs:</code>) and using a value from each row execute the second query to count duplicates for that one period.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-87704089898394061532018-05-21T13:14:00.000+01:002018-05-21T13:14:39.860+01:00Advanced Compression no good for Data WarehousingThis post is about experiencing "<code>table fetch continued row</code>"
in compressed data warehouse tables that experience some level of
updates to existing (old) data, and how using the Advanced Compression
option of Oracle Enterprise Edition not only does not help in avoiding
this but can actually introduce severe performance penalties into your
daily load jobs.<br /><br />
<h3>
Background</h3>
Basic Compression is a free
part of Oracle Enterprise Edition, allowing you to compress the data
within a table, and is useful for large databases such as a Data
Warehouse. Not only will compression reduce the total disk storage used
by that data in the table, it can also improve query performance in
various ways - full table scans will read less disk blocks, and
potentially more rows can be cached in the buffer cache in the SGA
within the same number of blocks. One downside to Basic Compression is
that it can only be used to compress existing data in a table or
partition (done by "moving" that table or partition within the same
tablespace). Newly inserted rows or updated rows are not compressed
when added to a data block, and are added in their normal uncompressed
form. This can mean that over time a compressed table or partition can
become less compressed due to updates and newly inserted rows.<br /><br />Oracle's
official solution to this is the extra Advanced Compression Option with
the OLTP Compression feature. OLTP Compression <em><strong>will</strong></em>
compress the data of newly inserted rows into a table where it has been
configured. It basically does this by inserting the rows in the normal
uncompressed format (no performance penalty for these inserts) until
the block becomes full, whereupon it then compresses that one block
(performance penalty for that particular insert), which will also
increase the free, unused space in that block for yet more rows to be
inserted. The assumption is that overall for an average / normal OLTP
application the frequency of block compressions will be low, and the
impact on most transactions will be negligible (assuming they SELECT and
INSERT on many other tables within the same business transaction).<br /><br />A
"perfect ideal" Data Warehouse would only ever load new data into the
current partition of a partitioned table. Existing data would never be
updated, and new data would never be inserted into a previous, old
partition. In such a scenario you have two potentially straightforward
strategies for compressing your data using only Basic Compression:<br /><ul>
<li>Wait until the end of a period when the database will switch to the next partition, and then compress the previous partition</li>
<li>As above plus re-compressing the current partition at a set frequency during the period in which it is the current partition</li>
</ul>
The
first strategy means that all old partitions have the data in them
compressed, and so take up minimal disk storage. However, the data in
the current partition is not compressed at all. The old partitions only
ever need to be compressed once, because this is an ideal scenario and
that old data is never updated or extra data inserted into it.<br /><br />The
second strategy is the same as the first for old partitions, but also
compresses the current partition at scheduled intervals to gain some of
the benefits from compression on the most recently loaded data. How
frequently you compress the current partition depends on various
factors, such as the time to do this compression and the time available
each night for the daily data load into the data warehouse. As the
partition grows in size with more data being added to it, so the
compression will take longer each time.<br /><br /><h3>
The Compressed Row Update Problem</h3>
Both
strategies are straightforward and only use the features of Basic
Compression. However, real world data warehouses are not "perfect" or
"ideal" - old data can arrive after a period has ended and must be
loaded into old, existing partitions, and corrections can come through
for old data needing updates to the existing data. If your data volumes
are very high, both within each partition and for the number of rows of
old or corrected data, then your compressed partitions can become less
compressed as a result and can grow in size. <br /><br />As well as an
increase in the disk storage used, there is also a potential performance
penalty to pay for such data changes in old partitions due to the way
the update physically happens within the data block. When the
compressed row is updated it will not fit back into its original slot in
the data block, because it will have grown in size one way or another.
This means that Oracle will try and move that row to the end of the
data block into the free, unused space area as reserved by the "<code>PCTFREE</code>" setting for that table. However, Basic Compression sets "<code>PCTFREE</code>"
to zero by default, so there is nowhere for such updated rows to go
within its current data block. What Oracle will do is "insert" that row
into a new data block (as used for normal inserts), and leave behind a
forward pointer in the block where the row used to be. This is termed
"row migration" - the row's physical storage has migrated to another
block within the table.<br /><br />This row migration has a negative impact
on index based reads, because the index structure has the ROWID of each
data row in its leaf blocks, and the ROWID is really a map to the
physical location of the data row - it specifies things such as the data
file, block and offset within the block within the encoding in the
ROWID. When a query using such an index goes to read a migrated data
row it first reads the original data block as indicated by the ROWID and
rather than the expected data row instead finds a forward pointer
(another ROWID) to where the row has now been stored. So the query must
issue a second read against the second data block in order to get the
data row. Thus migrated updated rows cause extra disk reads when
accessed through an index. Such "second, additional disk reads" are
visible in the Oracle statistic of "<code>table fetch continued row</code>" i.e. you can see how many table row fetches cause an additional fetch due to migrated data rows.<br /><br /><h3>
Advanced Compression</h3>
In such scenarios you <em>might</em>
assume that using the OLTP Compression feature of Advanced Compression
would solve this. You would hope that it would keep the data in the old
partitions compressed for you without any explicit action needed from
you, and avoid or minimize the number of migrated rows. Unfortunately
this is not true, and in fact using OLTP Compression will probably
result in both a severe performance penalty during the daily data load
and have no positive impact on updated rows being migrated. Which
ultimately means that there is little to no point in using OLTP
Compression on most data warehouses. Of course, there may be some
scenarios where it might be of some benefit, but for the vast majority I
believe it will only have a negative impact overall.<br /><br />The first
performance penalty is that it has a direct slow down on the INSERT's
being executed by the daily data load jobs as the blocks are compressed
as they fill up. This is to be expected of course, and you would hope
that the performance penalty was relatively small and would ensure that
all blocks in a table were compressed.<br /><br />Unfortunately there is a second and more substantial performance penalty arising from this, because each block <em><strong>will be compressed multiple times</strong></em>
during the data load. And this repeated compression of each block
multiplies up the total CPU time cost used during the daily data load to
the level where it can more than double the elapsed time of the data
load jobs. This is a consequence of the way OLTP Compression works.
Compression is triggered each time a block "becomes full", which will
happen multiple times during the data load process. After a block is
compressed for the first time, it will now have free unused space in it,
into which new, subsequent rows can be inserted. So the same block
will repeatedly fill up, compress, fill up again, compress, fill up
again, compress, until eventually it is full of only compressed rows and
another row cannot fit in the unused, free space.<br /><br />And the third
performance penalty is that the frequency of such block compressions is
much higher with longer, larger data rows, such as typically found in a
data warehouse. Longer rows will fill up a block sooner, triggering a
compression sooner and more often as fewer rows are needed each time to
consume the remaining free space i.e. the average number of rows
inserted between a compression occurring gets lower as the row size
increases. And the more compression operations that happen during a
data load results in much more CPU time being consumed for a given
number of rows. See the end of this post for the detail of the maths
behind this.<br /><br />This triple whammy combination - every <code>INSERT</code>
slows down, blocks are compressed multiple times during the load,
longer rows lead to more frequent block compressions - can cause data
load times to double or triple in elapsed time. The precise level of
the performance impact depends on various factors such as the rows
sizes, the level of compression achievable, and the CPU's in your
computer system.<br /><br />Conversely Basic Compression only ever
compresses each block once, so the total CPU used by it is far lower
than for Advanced Compression. And because the compression operation
takes place separately, there is no direct impact on the daily data
loads themselves i.e. they continue to take the same elapsed time as
normal.<br /><br />But that is not the end of the negative impacts of using OLTP Compression in a data warehouse. The other big negative is that <em><strong>it does not actually solve the original problem</strong></em> you were experiencing - updated rows being migrated. Under OLTP Compression, only an <code>INSERT</code> will trigger a compression of a block, never an <code>UPDATE</code>. This has been shown by various people:<br /><ul>
<li>Oracle Blog post on <a class="externalLink" href="https://blogs.oracle.com/dbstorage/updates-in-row-compressed-tables" target="_blank" title="External link to https://blogs.oracle.com/dbstorage/updates-in-row-compressed-tables">Updates in Row Compressed Tables</a></li>
<li>Oracle Blog post of a copy of the <a class="externalLink" href="https://blogs.oracle.com/db/master-note-for-oltp-compression" target="_blank" title="External link to https://blogs.oracle.com/db/master-note-for-oltp-compression">Master Note for OLTP Compression</a></li>
<li>An article series on Compression by Jonathan Lewis over at Red Gate, the final one covering the OLTP Compression scenarios<ul>
<li><a class="externalLink" href="https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/" target="_blank" title="External link to https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/">Part 1 - Basic</a></li>
<li><a class="externalLink" href="https://www.red-gate.com/simple-talk/sql/oracle/compression-in-oracle-part-2-read-only-data/" target="_blank" title="External link to https://www.red-gate.com/simple-talk/sql/oracle/compression-in-oracle-part-2-read-only-data/">Part 2 - Read Only</a></li>
<li><a class="externalLink" href="https://www.red-gate.com/simple-talk/sql/oracle/compression-in-oracle-part-3-oltp-compression/" target="_blank" title="External link to https://www.red-gate.com/simple-talk/sql/oracle/compression-in-oracle-part-3-oltp-compression/">Part 3 - OLTP</a></li>
</ul>
</li>
</ul>
So OLTP Compression doesn't help at all with the original scenario - <code>UPDATE</code>'s causing row migration with additional "<code>table fetch continued row</code>"
block reads, and increased storage. Your data warehouse will still
have migrated rows from updates, and an increase in storage and disk
reads as a result. The only viable way to force migrated rows to move
back and be compressed is to use Basic Compression.<br /><br />And finally,
for completeness, OLTP Compression typically achieves a slightly lower
compression ratio than Basic Compression. This is because they both use
the same compression mechanism, and OLTP Compression leaves the value
of <code>PCTFREE</code> at the table / partition level setting unchanged (10% by default) whereas Basic Compression sets <code>PCTFREE</code>
to 0 by default during the compression, so achieving more compressed
rows per block. The difference in compression level will be very small
and minor, but it is there nevertheless.<br /><br /><h3>
Conclusion</h3>
Although
Basic Compression is not "perfect", for a data warehouse it is by far
and away a better choice than trying to use OLTP Compression of the
Advanced Compression option. The performance penalties from the
significant increase in CPU usage during data loads when using OLTP
Compression are very severe. And ultimately it does not even address
the original problem we were concerned about - <code>UPDATE</code>'s causing row migration into different data blocks, resulting in additional "<code>table fetch continued row</code>"
block reads from queries needing those rows. And that's why sticking
with the free Basic Compression used on a scheduled basis after a data
load can be much more efficient for a data warehouse, combined with a
well designed partitioning strategy.<br /><br /><h3>
Appendix - Why Longer Rows Lead To More Frequent Compressions</h3>
I
stated earlier that when using Advanced OLTP Compression that longer
rows will both trigger block compressions sooner, and more frequently
i.e. more compressions on average per row (or conversely, fewer rows
inserted to a block on average between compressions). I will here
assume a simple 2:1 compression ratio between an uncompressed row and
its compressed form as it keeps the maths a bit simpler, but still shows
what is going on.<br /><br />If you can fit 50 uncompressed rows into a
block then you can fit 100 compressed rows into the same block. As the
compression halves the size of the uncompressed rows, the block is
compressed multiple times when data is inserted into it. First 50 rows
are inserted, which triggers a compression leaving half the block free,
so 25 more rows can be inserted, triggering another compression, then 13
rows can be inserted, the block is compressed, etc. Until eventually
100 rows fit in, but another one will not.<br /><br />This series of
halvings of how many rows can be loaded at a time is really like a sum
over the series of powers of the number 2. The last load will be 1 row,
the one before that of 2 rows, 4 before that, and so on. This means
that the block is "full" when 2 to a certain power is larger than the
number of rows that can fit in that block i.e. after that many
compressions another row will not fit in the block.<br /><br />For 100 rows in a block it will be 2<sup>7</sup>
(2 to the power 7) of 128 needed, which is 7 compressions. If each
block is compressed 7 times as it has rows inserted into it to achieve
100 rows in a block, then on average there is a compression every 100 / 7
= 14.29 rows.<br /><br />For a much longer row that can only fit 10 uncompressed rows per block, then 20 compressed rows will fit in that block. And 2<sup>5</sup>
is 32 (2 to the power 5) which is larger than 20, so 5 compressions
will be needed to fill each block. As 20 rows will have been loaded
into a block then a compression is occurring on average every 20 / 5 = 4
rows.<br /><br />This is a <em><strong>lot more frequent</strong></em> than
the previous case i.e. over 3 times worse on average! Thus OLTP
Compression on a data warehouse with larger rows leads to a <em><strong>much larger CPU penalty</strong></em>
from all of the repeated block compression that will be occurring
during data loads. And why the alternative of just doing a normal load
followed by a one-off compression using Basic Compression can be much
more efficient.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-32626072678177454182018-04-21T16:06:00.000+01:002018-04-21T16:06:50.941+01:00Top Oracle Monitoring Views & Tables<div>
<div class="viewer">
<h2>
Introduction</h2>
I can be a bit "old school" at
times on some things, and I like to be familiar with the underlying
technology being used by advanced features. So while using Enterprise
Manager to see what is happening on your database can be easier and
quicker than running queries manually, there may be times when EM is not
available on a particular database or it doesn't provide a particular
piece of information you are after. Knowing where this information is
held in the database instance lets you go direct to the source to get
what you need, which can either be quicker in some circumstances or the
only option in other cases.<br /><br />The key information about activity on
a database instance is available in a handful of dynamic performance
views. While using Enterprise Manager can be quicker and easier, don't
be afraid of these dynamic performance views. Learn about them and what
is in them and they can be your friends, and not your enemies.<br /><br />Generally I break these activity related views down into <em>current, real time</em> activity happening now and <em>historical</em> activity that happened some time ago. And the historical activity can be further split into <em>very recent</em> as opposed to <em>long ago</em> activity, where "long ago" is probably more than an hour ago.<br /><br /><h2>
Current Session Activity</h2>
The main view has to be <code>V$SESSION</code>, which has a record per connected session. You can query <code>V$SESSION</code>
to see what each session is currently doing (the SQL it is executing
(if any), or if it is waiting or blocked, or idle), or query it
repeatedly to see how a session changes over time (blocked or moving
through different SQL).<br /><br />This view has been expanded over the
different versions of Oracle and now includes a richer set of data about
what each session is doing spread across a number of useful columns
e.g. <code>STATUS</code>, <code>TYPE</code>, <code>SQL_ID</code>, <code>SQL_CHILD_NUMBER</code>, <code>ROW_WAIT_*</code>, <code>BLOCKING_SESSION</code>, <code>EVENT</code>, <code>WAIT_TIME</code>, <code>SECONDS_IN_WAIT</code>, <code>STATE</code>.<br /><br />When a session waits for something to complete, such as a disk wait, you can see this wait in <code>V$SESSION</code>.
But this is transitory, being about the current wait that session is
experiencing. These session wait times are also cumulatively recorded
in <code>V$SESSION_EVENT</code>, which records both the wait event and
the total wait time since the session connected to Oracle. You cannot
just use the current, total values in this, as they are cumulative since
the session was created. However you can compare the changes to this
over a short period of time, which will tell you the events that the
session waited on during that period of time, and how long it waited for
them. Tanel Poder has given us the very useful <a class="externalLink" href="http://blog.tanelpoder.com/files/scripts/snapper.sql" target="_blank" title="External link to http://blog.tanelpoder.com/files/scripts/snapper.sql">snapper.sql</a>
script that snapshots this dynamic performance view in real time and
reports on what changed i.e. the waits a session is experiencing.<br /><br /><h2>
Recent Active Session Activity</h2>
In version 10 (10.1) Oracle added in the new dynamic performance view of <code>V$ACTIVE_SESSION_HISTORY</code> (ASH), which contains point in time snapshot copies of data from <code>V$SESSION</code>. The idea is that you can now look back over recent history to see what <em>was</em> happening recently in individual sessions, letting you investigate issues <em><strong>soon after</strong></em>
they have happened. Potentially it also lets you investigate temporary
problems that do not persist long enough to investigate in real time
using <code>V$SESSION</code> directly. <br /><br />The manual says "<em>It
contains snapshots of active database sessions taken once a second. A
database session is considered active if it was on the CPU or was
waiting for an event that didn't belong to the <code>Idle</code> wait class.</em>"
The one snapshot every second is to minimise any performance impact on
the system and the storage requirements, and it also only saves the
data for active sessions i.e. nothing is copied for inactive, idle
sessions. The data from each snapshot is then held in memory within the
SGA, and accessible via the dynamic performance view of <code>V$ACTIVE_SESSION_HISTORY</code>. There is a limit to how much memory it will use, but it is intended to cover the past 30 minutes.<br /><br />Note
that Active Session History and AWR (Automatic Workload Repository) are
part of the "Oracle Diagnostic Pack", and a license for this must be
purchased before you have the right to use either of these i.e. these
features are not part of the core Oracle Database license. And such
additional "Packs" are only available for the Enterprise Edition of the
Oracle Database.<br /><br />Most of the columns in <code>V$ACTIVE_SESSION_HISTORY</code> are the same as in <code>V$SESSION</code>, with the addition of a <code>SAMPLE_TIME</code> column. You can query this in similar ways to how you might query <code>V$SESSION</code>, but with the addition of a <code>SAMPLE_TIME</code> column, and that the same <code>SESSION_ID</code>
can be repeated across different samples. Using ASH you can do things
such as identifying the most frequent wait events, or sessions
experiencing a specific wait event, or any session experiencing long
waits.<br /><br /><h2>
Historical Session Activity</h2>
As stated, the data in <code>V$ACTIVE_SESSION_HISTORY</code>
is held in memory in the SGA and is limited in size. Oracle will also
automatically save 10% of this data to a table on disk in the AWR
(Automatic Workload Repository), so you have an even longer period of
history to query (up to the AWR retention limits) but at a lower sample
rate. Even though the sample rate is much lower, the idea is that any
peaks in activity or contention will still be captured and be available
for analysis later on in <code>DBA_HIST_ACTIVE_SESS_HISTORY</code>.<br /><br />The
actual way it decides what samples to save is somewhat complicated i.e.
not just a straight "1 in 10" of the samples from the in-memory ASH.
It saves 1 in 10 of the sample sets when there was activity, but saves <em>all</em> of the session data from such a saved sample. This way you get <em>all</em>
of the active session data from a saved sample, rather than just 1 in
10 of the active sessions and so be missing a complete picture of a
sample set.<br /><br />You can query <code>DBA_HIST_ACTIVE_SESS_HISTORY</code> in much the same way as you would query <code>V$ACTIVE_SESSION_HISTORY</code>, except that there are fewer samples covering longer periods of time.<br /><br /><h2>
Historical System Activity</h2>
The
previous views are all about individual sessions, but Oracle also has a
lot of information about other activity on the database as a whole in
the other dynamic performance views. Most of these hold cumulative
values - the data values only ever go up as new measurements are added
in. In many cases their data is essentially a sum of the activity
across all of the sessions in the system. These views can provide other
useful information about what has been happening on a system. Oracle
also takes snapshots of these data sets into the AWR, into a series of <code>DBA_HIST_*</code> tables, which you again need the Diagnostics Pack license for the right to query them. <br /><br />The
default snapshot interval is every hour, and they are retained for 7
days (these can be changed if desired, which can be useful). An AWR
snapshot copies all data in all of the relevant dynamic performance
views into their corresponding <code>DBA_HIST_</code> table with a <code>SNAP_ID</code>
column added in to identify the relevant snapshot. Unlike ASH, which is
only a sampled subset, all of the data in the corresponding dynamic
performance views is saved on each snapshot, so there are no issues over
any missing data within a snapshot.<br /><br />You can make use of this AWR
data either by running a standard AWR report which summarises all
activity between two snapshots, or by writing your own queries directly
against these <code>DBA_HIST_*</code> tables for things of interest to
you. The latter can be useful if you don't want to have to produce a
set of AWR reports and then read through them manually to find something
of interest. The right kind of queries can summarise key activity data
across multiple snapshots, helping you identify any abnormalities or
peak activity periods or anything else of interest.<br /><br />In the same way that you might look at a dynamic performance view to see changes in real time, you can query the corresponding <code>DBA_HIST_*</code>
table to see what changed over longer periods of time and between
multiple snapshots. But your query will need to get two sets of
snapshots (using the <code>SNAP_ID</code> column probably) and subtract
the cumulative values of one from the other to get the change in value
over that time i.e. the actual measure of the activity over that period
of time. I've <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2016/10/awr-summary-data-extracts.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2016/10/awr-summary-data-extracts.html">blogged about querying the AWR DBA_HIST_ tables directly</a> before.<br /><br />For reference here are a few dynamic performance views and the corresponding <code>DBA_HIST_</code> table they get saved to by an AWR snapshot:<br /><ul>
<li><code>V$SYSSTAT</code> (individual named system statistic counters) maps to <code>DBA_HIST_SYSSTAT</code></li>
<li><code>V$SYSTEM_EVENT</code> (wait event occurrence and time counters) maps to <code>DBA_HIST_SYSTEM_EVENT</code> and to <code>DBA_HIST_BG_EVENT_SUMMARY</code> for Background events</li>
<li><code>V$SQL</code> has no direct map, but <code>V$SQLSTATS</code> maps to <code>DBA_HIST_SQLSTAT</code></li>
</ul>
<h2>
SQL Statements Executed</h2>
The
other approach to investigating performance problems is to look at the
SQL statements being executed, as opposed to what individual sessions
are doing or the system as a whole. The most relevant dynamic
performance views for SQL execution information are:<br /><ul>
<li><code>V$SESSION</code> - contains the <code>SQL_ID</code> of the SQL statement a session is currently executing (if any)</li>
<li><code>V$SQL</code> and <code>V$SQLSTATS</code> for all SQL statements recently executed, with execution statistics, etc.<ul>
<li>Data columns include:- <code>SQL_TEXT</code>, <code>CHILD_NUMBER</code>, <code>USERS_EXECUTING</code>, <code>EXECUTIONS, FETCHES, BUFFER_GETS, CPU_TIME, ROWS_PROCESSED</code></li>
<li>Remember that these activity counters are cumulative and simply increase in value on each execution of that SQL statement</li>
<li>SQL statements are retained in <code>V$SQL</code>
while they are being executed, but after that they can be removed for
that slot to be reused by another SQL statement. So there is no
guarantee that <code>V$SQL</code> contains <em>all</em> SQL statements recently executed</li>
<li>However, the most frequently executed SQL statements will tend to remain in <code>V$SQL</code></li>
</ul>
</li>
</ul>
<br /><h2>
Summary</h2>
The
following dynamic performance views and AWR tables can directly provide
you with information about what is and what has happened on an Oracle
database instance, and are extremely useful for investigating
performance problems. Remember that to use some of these you must have
purchased the Diagnostics Pack for your Enterprise Edition Oracle
database software.<br /><ul>
<li><code>V$SESSION</code> for what is happening now in each session</li>
<li><code>V$SESSION_EVENT</code> for total waits a session has experienced</li>
<li><code>V$ACTIVE_SESSION_HISTORY</code> for recent session history</li>
<li><code>DBA_HIST_ACTIVE_SESS_HISTORY</code> for a sampled subset of session history over a longer period of time</li>
<li><code>DBA_HIST_SYSSTAT</code>, <code>DBA_HIST_SYSTEM_EVENT</code>, <code>DBA_HIST_SQL_STAT</code> and <code>DBA_HIST_SYS_TIME_MODEL</code> in the AWR for full copies of the corresponding dynamic performance view taken at regular intervals</li>
<li><code>V$SQL</code> and <code>V$SQLSTATS</code> for the most recently executed SQL statements, still present in the shared pool in the SGA</li>
</ul>
</div>
</div>
John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-1896973458592464772018-03-28T19:03:00.000+01:002018-03-28T19:03:44.796+01:00Hash Join Overflow to Disk Summary[I'm a bit busy at the moment working with a new client on a large and complex database, so I'm afraid I've only got time to do a quick post. I have a new post on a different topic planned, but am struggling with the time needed to write it up properly, so a short post is all I can squeeze in]<br />
<br />
This is just a summary of the key results from my findings on how Oracle
is costing a Hash Join when it overflows to disk, pulling all the
conclusions together into one place.<br /><br />In the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html">first post</a> I arrived at the following formula:<br />
<ul>
<li>( ((Build Columns Size + 12) * Build Row Count) + ((Probe Columns Size + 12) * Probe Row Count) ) * 0.0475</li>
</ul>
Where
the "Columns Size" is the sum of the hash table storage for each column
i.e. column data storage + 2 bytes per column, and the "Row Count"
values are per 1,000 rows.<br /><br />In the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-costing-2.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-costing-2.html">second post</a> I showed that the costing was unaffected by the value of the Multi-Block Read Count parameter i.e. the costing was done only using single block disk I/O's.<br /><br />In the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/02/hash-join-overflow-costing-3-simpler.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/02/hash-join-overflow-costing-3-simpler.html">third post</a> I proposed a simpler formula using only values shown in an execution plan of:<br /><ul>
<li>( (Build KBytes + (12 * Build Rows / 1000) ) + (Probe KBytes + (12 * Probe Rows / 1000) ) ) * 0.0485</li>
</ul>
This
is only slightly less accurate than the original formula, but is much
simpler to use as it only needs values reported in the execution plan,
and no other calculations or derived values by you.<br /><br />In the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/02/hash-join-overflow-costing-4-temporary.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/02/hash-join-overflow-costing-4-temporary.html">fourth post</a> I showed that the reported size of the Temporary Space (TempSpc)
in the execution plan is actually the size of the hash table itself,
being dependent only on the size of the Build data set. And that
knowing this hash table size will help you in deciding how much larger
it is than the available memory in the PGA used to store such hash
tables.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-18375360714471015172018-02-14T10:35:00.000+00:002018-02-14T10:35:14.430+00:00Hash Join Overflow Costing #4 - Temporary Space + PGA UseNow I want to look at the value reported for "Temporary Space" (TempSpc)
and how to check whether the hash table is close to fitting in memory
within the PGA or not i.e. just a bit too big or way too big.<br />
<br />
<h2>
Temporary Space</h2>
It
seems that this is simply the size of the hash table itself, which is
in turn the size of the Build data set plus an additional 12 bytes per
row. Which means that we can use part of the cost formula posted
earlier for this to estimate the size in KiloBytes:<br /><ul>
<li>Build KBytes + (12 * Build Rows / 1000)</li>
</ul>
The evidence I have for this is:<br /><ul>
<li>The "TempSpc" value only changes when the Build size changes - never when the Probe size changes</li>
<li>When additional columns are added to Build data set in the <code>SELECT</code> the "TempSpc" value increases by the same amount as the Build size increases (for the same number of rows)</li>
<li>When the number of rows in the Build data set increases, the "TempSpc" value increases by an additional 11.76 bytes per additional row<ul>
<li>Which is almost 12 bytes per row, as mentioned before as overhead in the hash table</li>
</ul>
</li>
</ul>
You
could of course "refine" the above formula and use "11.76" instead of
"12" and that would give you a value for the hash table that is much
closer to what is reported for the "TempSpc" in the execution plan. But I think that using 12 is good enough for only an estimate.<br />
<br />
<h2>
Hash Table Size vs PGA Size</h2>
I mentioned in the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html">first post</a> that the Optimizer uses the value of the internal initialization parameter "<code>_smm_min_size</code>"
for the memory it can use as the work area for the hash table within
the PGA. Thus if the size of the hash table is larger than this the
Optimizer will cost the Hash Join as an overflow to disk operation.
Using the formula posted above for the Temporary Space we can now
estimate the size of the hash table needed and compare it to this
minimum work area size value.<br /><br />The notes I have on "<code>_smm_min_size</code>"
say that it will be 0.1% of the PGA when the size of the PGA is greater
than 128 MB, otherwise it will be 128 KB as a minimum. Assuming your
PGA is larger than 128 MB then you can calculate what this is yourself.
Note also that its value is reported in 10053 trace files along with
all other initialization parameters, so you can directly verify its
value if you want to.<br /><br />At run time, though, the PGA may not be
heavily used and may have a lot of unused memory available. An
executing query can allocate and use more than the minimum work area
value at run time for execution plan operations, up to a limit specified
by the "<code>_smm_max_size</code>" initialization parameter. For PGAs larger than 1 GB this is set to be 10% of the PGA size. For smaller PGAs it is 100 MB, or 20% when the PGA is less than 500 MB.<br /><br />Knowing
how large the hash table needed will be, lets you compare it to the
maximum possible work area size in your PGA and see what the likelihood
is of the hash table being able to fit within the PGA at run time. As
the ratio between the minimum and maximum possible work area sizes is
100 (0.1% to 10%) this is a large range and a hash table may have a
reasonable chance of fitting in the PGA at run time, depending on how
active your system is and the other queries running.<br /><br />Equally it
lets you consider what the benefit might be from increasing the size of
the PGA significantly if you have a critical query involving a large
hash join operation that is overflowing to disk. That said, there are
some ways of rewriting queries that can minimise the size of the data
being pulled through into the hash table used by the hash join, and so
potentially get a smaller hash table to fit into memory in the PGA. But
this depends on multiple factors, mainly being the number of columns of
data coming through in the Build data set. If this is only a few
columns then there won't anything that can be saved by such a query
rewrite.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-64453694082122793272018-02-06T11:40:00.000+00:002018-02-06T11:40:58.455+00:00Hash Join Overflow Costing #3 - Simpler FormulaSo far I have <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html">offered a formula for how a Hash Join that overflows to disk is costed</a> and <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-costing-2.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-costing-2.html">confirmed that this is only costed in terms of single block disk operations</a>.
While that formula produces very accurate results (less than 1%
difference to the reported cost for the Hash Join operation) it requires
you to obtain size information on each individual column being
retrieved from both the Build and Probe data sets. And this could be
quite tedious when there are many columns involved, or you don't have an
easy way to work out the source columns involved in each data set.
There is a simplification we can make to eliminate this detailed per
column information and just use the information reported in the
execution plan itself.<br /><br />All of the columns being retrieved for the
Build and Probe data sets are actually included within the "Bytes"
value reported in the execution plan for each data set. And this value
includes both the individual column data and any per-column overhead as
well. What it does not include is the extra 12 bytes per row overhead
needed in the hash table itself. We can approximate the size of this
second part of the data by using the "Rows" value reported in the
execution plan.<br /><br />Examining the "Bytes" and "Rows" values in the
execution plans for my test queries (see first post for details of these
queries), I can see that the increase in the Hash Join cost is about
0.0485 per KB of increase in either data set (Build or Probe) for the
same number of rows. Previously I determined that there was a 12 byte
overhead within the hash table per data row expected.<br /><br />This produces a revised Hash Join cost formula <em>using only values from the execution plan</em> of:<br />
<ul>
<li>( (Build KBytes + (12 * Build Rows / 1000) ) + (Probe KBytes + (12 * Probe Rows / 1000) ) ) * 0.0485</li>
</ul>
Note
that the "Bytes" values used need to be in "K" units, whereas the
"Rows" is not and so is divided by 1000. Checking this formula against
the actual Hash Join costs of the test queries I ran I can see that it
has an error of about 1% i.e. it is not as accurate as the previous
formula, but is still accurate enough I believe. <br /><br />Lets check how close this from one of the test queries. Here is the execution plan produced:<br /><pre>--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 1266 (100)| |
| 1 | SORT AGGREGATE | | 1 | 30 | | | |
|* 2 | HASH JOIN | | 50000 | 1464K| 1712K| 1266 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| HJ50K | 50000 | 1123K| | 365 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL| HJ100K | 100K| 683K| | 728 (1)| 00:00:01 |
--------------------------------------------------------------------------------------
</pre>
We can see:<br /><ul>
<li>The Build and Probe data access cost is 365 + 728 = 1093</li>
<li>The cost of the Hash Join operation itself is 1266 - 1093 = 173</li>
<li>The calculated approximate cost is (1123 + (12 * 50) + 683 + (12 * 100)) * 0.0485<ul>
<li>= (1123 + 600 + 683 + 1200) * 0.0485</li>
<li>= 3606 * 0.0485 </li>
<li>= 174.891</li>
</ul>
</li>
<li>The difference is +1.891, which is 1.1% of the actual cost</li>
</ul>
This
formula can therefore be a useful check against the cost reported when a
Hash Join operation overflows to disk, and for determining which is the
biggest cost factor i.e. the Build or the Probe data sets.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-3592802571319662512018-01-23T10:22:00.000+00:002018-01-23T10:22:19.472+00:00Hash Join Overflow Costing #2In my <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html">previous post on costing a Hash Join overflow to disk</a>
I came up with a formula for the cost of a Hash Join operation that overflows to disk based on tests I had done, but I also
mentioned that there might be other factors involved not yet exposed by
my testing. My main concern was whether the disk I/O's involved were
all of the same type, or a mix of single block and multi-block disk
I/O's. I've now done some further testing, and have something to share.<br /><br />The
tests show that all the disk I/O's are costed by the Optimizer as
single block disk I/O's. I say this because I changed the cost of
multi-block disk I/O's and the cost reported for the Hash Join operation
itself did not change. Hence it must be costed as single block disk
I/O's by the Optimizer.<br /><br />The tests I was doing were a join between two tables that used a Hash Join in the execution plan (see <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2018/01/hash-join-overflow-cost-formula-1.html">previous post</a> for the SQL used in this query). The key points are:<br />
<ul>
<li>The two source tables are accessed by Full Table Scans (TABLE ACCESS FULL) in the execution plan</li>
<li>These feed into a HASH JOIN operation</li>
<li>Which in turn feeds into a final operation - a SORT AGGREGATE for the SUM's in the SELECT output</li>
<li>By
adding together the costs of the source data accesses and subtracting
from the total cost reported for the Hash Join, we get the net cost of
the Hash Join operation itself</li>
</ul>
Oracle costs disk I/O operations using a set of statistics it has about the computer system it is running on - termed <a class="externalLink" href="https://docs.oracle.com/en/database/oracle/oracle-database/12.2/tgsql/gathering-optimizer-statistics.html#GUID-AC23D691-5C54-4EF9-BF9F-65121F2AFC31" target="_blank" title="External link to https://docs.oracle.com/en/database/oracle/oracle-database/12.2/tgsql/gathering-optimizer-statistics.html#GUID-AC23D691-5C54-4EF9-BF9F-65121F2AFC31">System Statistics</a>
and stored in the database itself. These can be a minimal set of just a
few statistics or a more complete set of statistics providing a more
detailed breakdown of different types of operations on the computer.<br /><br />In
my Oracle 12 instance (12.1.0.2.0) I only have the "No Workload"
default system statistics set in the database, which is the minimal set
of statistics. This provides values for the following:<br /><ul>
<li><code>IOSEEKTIM</code> - Time in milliseconds for a disk to move the read / write head to the track you want to access - default value of 10</li>
<li><code>IOTFRSPEED</code> - Bytes transferred per millisecond once a transfer starts - default value of 4096</li>
<li><code>MBRC</code> - Multi-block read count in blocks achieved on average - default value of 8</li>
<li><code>CPUSPEEDNW</code> - Speed of CPU in units of millions of cycles per second (not relevant here)</li>
</ul>
<strong>Note</strong> that the <code>MBRC</code> here is not the same as the initialization parameter <code>db_file_multiblock_read_count</code>, even though their definitions seem to be the same. This <code>MBRC</code>
is not set from the initialization parameter, and remains the same
within the system statistics until explicitly changed (one way or
another). <code>MBRC</code> is intended to represent the actual size of
multi-block reads that were achieved on your computer system, and is
set when Workload statistics are collected or when set manually.<br /><br />The
Optimizer uses these minimal system statistics to derive the values for
the elapsed times for single and multi-block reads (<code>SREADTIM</code> and <code>MREADTIM</code> respectively). The formulae it
uses are:<br /><pre>SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)
MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED)
</pre>
With the default values for System Statistics as stated, and a database block size of 8192 (8 KB) this gives values of <code>SREADTIM</code> = 12 ms, <code>MREADTIM</code> = 26 ms.<br /><br />What
this really means for the Optimizer is that a multi-block read takes 26
/ 12 = 2.167 times longer than a single block read i.e. its "cost" is
2.167 times that of a single block read. This is used by the Optimizer
when costing multi-block read based operations - it "converts" the
multi-block read time into an equivalent number of single block read
operations, which all have a cost unit of 1 and are the basis for how
the Optimizer reports and compares execution plan costs.<br /><br />If I were to increase the <code>MBRC</code> from 8 to 16, then the <code>MREADTIM</code>
would increase to 42 ms, and a cost factor of 42 / 12 = 3.5 times that
of a single block read. The relative net cost though would actually
decrease, because each multi-block read would now be reading 16 blocks
in one disk I/O rather than 8 blocks i.e. half the number of disk reads
are needed to read the same number of blocks in from disk, at a relative
cost ratio of 1.615 (3.5 / 2.167). So the decrease in the number of
disk reads is greater than the relative increase in cost per disk read.<br /><br />If the Hash Join operation involved multi-block disk reads then changing <code>MBRC</code>
would change the relative cost of those disk reads and we would see a
change in the net cost of the Hash Join as reported in the execution
plan.<br /><br />I changed the value of <code>MBRC</code> manually and then shutdown and restarted the database:<br /><pre>exec dbms_stats.set_system_stats ('MBRC', 16)
shutdown immediate
startup
</pre>
This was to ensure that nothing was still present in memory in the
SGA - neither any existing execution plans, or any system statistics
used to cost those execution plans.<br /><br />When I ran the same test
queries again, the total costs decreased because the source data
accesses are full table scans which use multi-block reads i.e. this was
expected, and so the total cost reported at the Hash Join step
decreased. However, when the costs of the source data accesses were
subtracted from the total cost after the Hash Join step, the net cost of
the Hash Join operation itself was exactly the same as it was before
for the same query.<br /><br />Likewise when I increased the <code>MBRC</code> value to 32 there was no change in the net cost of the Hash Join operation when it overflowed to disk.<br /><br />The
conclusion then is that no multi-block disk reads are used within the
costing of the Hash Join operation by the Optimizer when the hash table
it uses overflows to disk.<br /><br />Which means that the formula I posted before for an overflowing Hash Join cost is not dependent on multi-block reads at all:-<br /><ul>
<li>( ((Build Columns Size + 12) * Build Row Count) + ((Probe Columns Size + 12) * Probe Row Count) ) * 0.0475</li>
</ul>
Where the "Columns Size" is the sum of the hash table storage for each column i.e. data storage + 2 bytes per column.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-87047648648351341922018-01-11T15:02:00.000+00:002018-01-11T15:02:44.387+00:00Hash Join Overflow Cost Formula #1<h2>
Introduction</h2>
<br />
The Hash Join join method was introduced in Oracle
version 7 (7.3 specifically I believe), and one of its main goals was to
be a method that lent itself well to being parallelisable. However, it
is such an efficient join method for larger data volumes even in serial
execution that it is often picked by the Optimizer over Nested Loops or
Sort Merge because of its lower execution cost. This makes the Hash
Join method probably the most frequently used method by the Oracle
Optimizer, often appearing in execution plans for SQL queries.<br /><br />A
Hash Join works by building a table in memory containing the data from
the first data set (termed the Build data set), and then reading the
second data set (the Probe data set) to lookup each data row into the
in-memory table for any match (join). The in-memory table is structured
and accessed by applying a "hash function" to the relevant join data
columns. This hashing has various benefits around performance, handling
any kind of input data value range, and distributing the input values
within the table (minimising bunching values together).<br /><br />When this
hash table fits in memory the additional cost of the Hash Join
operation is negligible because it only involves CPU and memory
operations and these are multiple orders of magnitude faster than disk
accesses i.e. there is often little or no difference to the total
reported cost of the Hash Join over the sum of the costs of access of
each source data set. However, when the hash table needed is larger
than can fit in available memory in the PGA then it must overflow to
disk, which in turn significantly increases the cost of the Hash Join
operation itself.<br /><br />A question I have had for a long time is "<em>How does Oracle cost this overflowing Hash Join operation</em>"?
Can I replicate this cost formula and understand what the main factors
are within this reported cost? Which is a bigger factor - the size of
the Build data set or the Probe data set? Knowing such things it might
offer the possibility of gaining some insights into ways of tuning such
large hash joins. At the least I would know more about how the
overflowing hash join actually works in practice.<br /><br /><a href="https://jonathanlewis.wordpress.com/">Jonathan Lewis</a> gives a description in his <strong>Cost Based Oracle Fundamentals</strong>
book of how the overflowing Hash Join operation works, with a formula
for the major cost components involved. However, I have always found
this formula to be more descriptive than quantitative, and to not be
easy to use to arrive at a comparable value to what the Optimizer has
reported.<br /><br />I would like a more straightforward quantitative
formula that I could use myself to estimate whether a Hash Join will
overflow to disk or not, and how much its cost will be. After some
research I believe I have arrived at such a formula which I will share
here. Note that I'm not saying this is a "perfect" formula, just that
this is the conclusion I have arrived at so far as a result of the tests
I have done, and it seems to fit the results I have very well. I'll
continue to post more details when I refine or revise this in the future<br /><br />
<h2>
Why Hash Join Overflows</h2>
<br />
The
limit to the size of a single Hash Table or other "work area" in the
PGA is determined by a hidden, internal initialization parameter of "<code>_smm_min_size</code>".
If the size of the Hash Table needed would be larger than this, then
the Optimizer assumes that it will overflow to disk and costs it
accordingly. My notes say that the value of "<code>_smm_min_size</code>" is the larger of 128 KB or 0.1% of <code>PGA_AGGREGATE_TARGET</code>. I cannot find exactly where I got this information from, but my memory is telling me that it was from <a class="externalLink" href="http://oracle-randolf.blogspot.com/" target="_blank" title="External link to http://oracle-randolf.blogspot.com/">Randolf Geist</a>, possibly in a response to a question on one of the Oracle forums.<br /><br />The
main reason Oracle limits the potential size of a Hash Table is to
ensure that a high number of other queries can run concurrently and not
be starved of memory within the PGA. The Optimizer is assuming a worst
case scenario to make sure other sessions do not suffer when a very
large query is executed by one session. However, it is possible that
when executed the hash table <strong><em>will not overflow to disk</em></strong>
i.e. if at the moment that query is executed there is enough free
memory in the PGA, then Oracle will let that session have a larger "work
area" than the value of "<code>_smm_min_size</code>". So even though
the Optimizer has costed the Hash Join operation as an overflow to disk
and costed it accordingly, it does not mean that it will always overflow
to disk when executed.<br /><br />
<h2>
How Overflow Hash Join Works</h2>
<br />Jonathan
Lewis gives a description in his book of how the Hash Join works when
it overflows to disk. I won't repeat the details here as they are not
particularly relevant at the end of the day. But a crude summary would
be that:<br /><ul>
<li>The first data set is read once and broken into
"chunks" that are written out to disk (temporary storage), where each
chunk is a contiguous sub-set of the overall hash table</li>
<li>One or more of these chunks are then kept in memory ready for the pass over the second data set</li>
<li>The second data set is read in and hashed in the normal way:- <ul>
<li>If it hashes to an in-memory chunk then it is matched as normal</li>
<li>Otherwise it is written out to disk (temporary storage) split out into chunks on the same basis as the first data set</li>
</ul>
</li>
<li>Then
remaining chunks of the first data set are read into memory, and the
corresponding chunks of the second data set read again and matched as
normal. This is repeated until all of both data sets have been
processed.</li>
</ul>
Essentially this is saying that there will be an
additional pass over both sets of data - after the first read of each
data set (already costed within the execution plan), there is an
additional write out to disk of the majority of each data set, followed
by a read back of each data set. Also extra reads and writes may be
needed in the first pass of each data set, to keep the data in the
"chunks" properly grouped together on disk.<br /><br />It is not clear
whether these write and read operations will be single block or
multi-block disk read operations, or how many of them there will be.
Potentially multi-block reads could be used when reading back in the
pre-hashed chunks from disk. Luckily though this turns out to be
irrelevant to the cost formula I have arrived at.<br /><br /><h2>
Deriving Hash Join Overflow Cost</h2>
<br />Here is how I went about it. I created a set of standard test tables (see later for SQL DDL), each with a mix of <code>NUMBER</code> and <code>VARCHAR2</code>
columns to pad them out a bit, and populated them using a repeatable
"connect by" data generator with a different number of rows in each test
table. I then ran a query joining 2 of these tables together (see
later for SQL), immediately examined the execution plan (from <code>dbms_xplan.display_cursor</code>) and noted the costs of each operation.<br /><br />Without
any indexes on any of these tables the execution plan was always 2 Full
Table Scans feeding into a Hash Join operation. When the smaller,
build table became large enough the Hash Join operation would overflow
to disk, causing its cost to rise significantly, and a "TempSpc" column to appear in the execution plan with a reported value.<br /><br />By
varying only one thing at a time between queries I could see how the
Hash Join cost changed when it was overflowing to disk. I was not
interested in those executions where the Hash Join did not overflow to
disk i.e. where the hash table did fit in memory. Only those executions
that involved the Optimizer assuming it would overflow to disk. By
examining the change in cost for the Hash Join operation for a
corresponding change in only one of the joined tables I could deduce a
multiplying factor being used within the underlying Hash Join cost
calculation.<br /><br />My Oracle version is 12.1 on Oracle Linux, so my
results are only guaranteed to be accurate for that version. I would
assume the results should be the same for 11g, as I don't think anything
significant has changed in how the Hash Join operation is costed, but
that would need to be verified. <br /><pre>BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production
</pre>
<br /><h2>
Hash Join Overflow Formula and its Accuracy</h2>
<br />
I started
by comparing the Hash Join cost when the number of columns in a data set
changed i.e. the size of the data set increased for the same number of
rows. Jonathan Lewis states that the memory needed per column is the
storage size of the column itself plus 2 additional bytes. I observed
that the Hash Join cost changed by a factor of 0.0475 per byte per 1000
rows. And that this multiplying factor was the same under different row
counts in either table in the query. <br /><br />This only involves the
columns needed by the query itself, which the Optimizer must extract and
process, and not all of the columns in the table. In this case it is
the columns referenced in the "<code>SELECT</code>" list and those referenced in the "<code>WHERE</code>"
clause. And the "storage size" is the number of bytes that Oracle uses
to store that data on disk, which is not necessarily 1 byte per value
or 1 byte per character or digit. For instance, a <code>NUMBER</code> is stored as 2 digits per byte. <br /><br />My
other observation was that when only the row count in one table changed
the Hash Join cost changed by a factor of 0.5675 per 1000 rows. As
this was a constant per row I wondered if this was something to do with
some extra data per row causing extra disk I/Os. And 0.5675 divided by
0.0475 gives 11.9474 which is almost 12, implying a 12 byte overhead per
data row within the Hash Table.<br /><br />Based on this, I arrived at the following formula for an overflowing Hash Join cost:<br />
<ul>
<li>( ((Build Columns Size + 12) * Build Row Count) + ((Probe Columns Size + 12) * Probe Row Count) ) * 0.0475</li>
</ul>
Where the "Columns Size" is the sum of the hash table storage for each column i.e. data storage + 2 bytes per column.<br /><br />I
then checked the calculated costs from this formula against the series
of test queries I had been using, and the resultant cost for the
overflowing Hash Join came out almost the same in all cases. The
percentage difference was under 1% in almost all cases, which I take to
be a high degree of accuracy. The only anomalies are for when the
"build data set" is only just bigger than can fit in the PGA, but even
then it is only a 3% difference. As the size of the build data set
increased so the percentage difference decreased.<br /><br />On the one hand
it might not be surprising to some people that my derived formula produces the
same results using the same inputs that were used to create the formula
in the first place. However, given that the queries I tested varied
both the number of columns being selected, the number of columns being
joined on, and the number of rows in each table, these would appear to
cover the only variables relevant to this formula. And in each of these
cases the change in the reported Hash Join cost from Oracle was always a
multiplier of this fixed constant of 0.0475.<br /><br /><h2>
Other Factors</h2>
<br />
While
I do believe that this formula is true and valid for the system I was
testing on, it may not be true for all other systems. It is likely that
the multiplying factor of 0.0475 will be different on other systems.
Given that this additional cost for the overflowing Hash Join is due to
the additional disk I/Os involved, then it would seem likely that
changes to the system statistics inside Oracle for disk read times would
result in a change in the value of this multiplying factor. I will
investigate this in my next series of tests.<br /><br />There may or may not
be some small "constant cost value" involved as well within the
formula, for some kind of constant overhead within the overflowing Hash
Join operation. This "constant cost value" would become negligible at
higher data volumes compared to the costs for the build and probe data
sets, but it might explain the slightly larger difference in calculated
cost at the smallest overflowing data set size.<br /><br />There is also the
concept of "one pass" and "multi-pass" hash joins within Oracle, as
well as "optimal" hash joins. I don't understand the real difference
between these, other than "optimal" is when it fits in memory and the
other two are when it overflows to disk. It is possible that what I've
seen has been the cost for "one pass" overflowing hash joins, and for
even larger data sets a "multi-pass" hash join would be used that would
involve a different cost formula.<br /><br />
<h2>
The SQL for the table and query</h2>
<br />
Here is the SQL to create one example table - they are all the same but for name and row counts - and the query used.<br /><br />Create Table - run from SQL*Plus with 2 command line arguments of the row count and a table name suffix e.g. "<code>@crhjtab 1000000 100k</code>".<br />
<pre>create table hj&2
tablespace testdata
as
select r pid
, 1 one
, 2 two
, 3 three
, 4 four
, 5 five
, 10 ten
, trunc (r / 10) per10
, trunc (r / 100) per100
, mod (r, 10) mod10
, mod (r, 100) mod100
, mod (r, 1000) mod1000
, mod (r, 10000) mod10000
, mod (r, 100000) mod100000
, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' filler1
, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' filler2
, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' filler3
from (select rownum r
from (select rownum r from dual connect by level <= 1000) a,
(select rownum r from dual connect by level <= 1000) b,
(select rownum r from dual connect by level <= 1000) c
where rownum <= &1)
;
exec dbms_stats.gather_table_stats (user, upper ('hj&2') )
</pre>
Query - run from SQL*Plus with 2 command line arguments of table name suffixes e.g. "<code>@hj0 100k 200k</code>"<br /><pre>select /* HashTest0 */ sum (hj1.one) sum_b1
from hj&1 hj1, hj&2 hj2
where hj1.pid = hj2.per10
and hj1.mod10 = hj2.mod100 ;
--
select * from table (dbms_xplan.display_cursor) ;
</pre>
I've only shown one query here, as the others I used are almost the same but for the columns in the "<code>SELECT</code>" list. The variations of this query had different numbers of columns in the "<code>SELECT</code>" list, to increase the number of columns from the build and/or the probe tables. John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-55881687784062461412018-01-03T15:10:00.000+00:002018-01-03T15:10:43.000+00:00Reading AWR Reports #2 - Report OverviewIn the <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2017/12/reading-awr-reports-1-start-with-problem.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2017/12/reading-awr-reports-1-start-with-problem.html">first post on Reading AWR Reports</a>
I made the point that you should first be clear on the details of the
"performance problem" you are investigating. If there is no specific
problem reported then there is no point looking for something that is
not there in an AWR report. I also stated that an AWR Report is one
amongst several tools available to you that you can use to investigate
Oracle database performance problems, and you should make sure that a
database wide AWR Report is the most suitable tool to be using for the
specific performance problem you are currently tasked with
investigating.<br /><br />Assuming that has all been addressed, then the
first thing I do with an AWR Report is a few high level checks - a kind
of sanity check to get a feel for whether there might be some kind of
performance issue there or not. <br /><br />A performance problem is
normally when a task takes too long to complete, and that is affected by
the various resources it uses while doing its work. The key
measurements are therefore both sides of this - the input work requests,
and the system resource usage while doing that work. I basically want
to check:<br />
<ul>
<li>The resource usage as a percentage of system capacity i.e. utilisation</li>
<li>The amount of work requested / done (assuming they are the same), which is really SQL statements executed for a database</li>
<li>Amount of "wait time" within "SQL execution time" i.e. wait as percentage of work time</li>
<li>Top wait events to see how these correlate with the previous observations</li>
</ul>
That's
it. Nothing more detailed for the first pass. If the input workload
is high enough and the resource utilisation high enough and something
looks wrong then I follow up with a second pass of the AWR Report diving
down into more details based on what the first pass showed up.<br /><br />Lets use the following AWR Report to show what I mean:<br /><pre>WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Startup Time Release RAC
------------ ----------- ------------ -------- --------------- ----------- ---
O12DB 3429470280 o12db 1 05-Dec-17 06:59 12.1.0.2.0 NO
Host Name Platform CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
xxxxxxxxxx.local Linux x86 64-bit 2 2 1 7.80
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 79 05-Dec-17 09:53:43 142 3.0
End Snap: 80 05-Dec-17 10:03:44 142 4.1
Elapsed: 10.03 (mins)
DB Time: 0.39 (mins)
Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Wait % DB Wait
Event Waits Time (sec) Avg(ms) time Class
------------------------------ ----------- ---------- ---------- ------ --------
DB CPU 12.4 53.8
log file sync 2,727 9.3 3.42 40.4 Commit
db file sequential read 177 1.2 6.82 5.2 User I/O
control file sequential read 827 .1 0.12 .4 System I
</pre>
The snapshot duration was 10 minutes, and the system had 2 CPU
Cores, so there were 20 minutes of CPU capacity available. The database
processing time is reported as "DB Time" and is 0.39 minutes, which is
about 1.95% resource utilisation. From this I can see that the database
was doing very little work at all, so there is not a database wide
performance problem worth investigating. Even though the Top Wait
Events seem to show that 40% of the time was spent waiting on the "<code>log file sync</code>"
event, the actual amount of Time is very trivial - 9.3 seconds of
waiting out of a 600 second period (10 minutes). Such low workloads and
resource utilisation can lead to various measurement anomalies, so it
is not worth trying to drill down further into the wait event details.
Any actual problem is probably specific to a single session, and should
be investigated using session specific tools.<br /><br />Another AWR Report:<br /><pre>WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Startup Time Release RAC
------------ ----------- ------------ -------- --------------- ----------- ---
O12DB 3429470280 o12db 1 05-Dec-17 06:59 12.1.0.2.0 NO
Host Name Platform CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
xxxxxxxxx.local Linux x86 64-bit 2 2 1 7.80
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 71 05-Dec-17 07:55:35 39 1.8
End Snap: 72 05-Dec-17 08:05:39 40 2.0
Elapsed: 10.06 (mins)
DB Time: 9.35 (mins)
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
Redo size (bytes): 3,470,831.7 5,403.2
Logical read (blocks): 61,507.4 95.8
Block changes: 20,563.9 32.0
User calls: 495.9 0.8
Parses (SQL): 302.6 0.5
Executes (SQL): 13,287.8 20.7
Transactions: 642.4
Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Wait % DB Wait
Event Waits Time (sec) Avg(ms) time Class
------------------------------ ----------- ---------- ---------- ------ --------
DB CPU 372.9 66.5
log file sync 272,633 196.8 0.72 35.1 Commit
db file sequential read 472 2.2 4.75 .4 User I/O
</pre>
Again, same database instance and a 10 minute period of time but a
different workload. Now we see that the resource utilisation is up at
46.47%, being 9.35 minutes of DB Time out of 20 minutes of potential CPU
capacity. From the Load Profile we can see that the database was
executing over 13,000 SQL statements per second. From the wait events
we see that "DB CPU" is 66.5% of the "DB Time" i.e. doing real work. So
waiting as a percentage of SQL execution time would be about 33.5%.
And we can see that almost all of this wait time is due to a single wait
event - "<code>log file sync</code>".<br /><br />Again, this does not
directly tell us whether there is a performance problem or not. In all
databases doing work there will be some wait events occurring, and one
of these will always be top when ordered by wait time. That does not
mean that it is necessarily a problem, or that it can be "fixed" in any
way. In this case the workload is doing a lot of inserts and updates in
small transactions, so there is a high frequency of commits which all
involve writes to the redo logs. And we would expect the "<code>log file sync</code>"
wait event to be there in the set of top 10 wait events in this
scenario i.e. it is not a "problem" but a "result" of the workload
running on the database.<br />
John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-26157504264202384282017-12-12T14:57:00.000+00:002018-10-30T20:39:29.094+00:00Reading AWR Reports #1 - Start With The ProblemI hope that this will be the start of a short series of posts on how to
read an AWR Report. I doubt that it will contain anything
revolutionary, but it is clear that some people do not know where to
start when presented with an AWR Report, so I aim to cover some
essentials about this. This first post will cover what to consider <b><i>before</i></b>
attempting to read and understand an AWR report, because ultimately AWR
is just another tool and not the answer itself. I will get into the
first things I look at in an AWR report in the next post.<br />
<br />
As I've
already indicated, AWR (Automatic Workload Repository) is really
another tool in your toolkit for examining what happened on an Oracle
database. And it is an extra cost option for the Oracle database
Enterprise Edition, requiring that you purchase the Diagnostics Pack
licence. Before using AWR you need to make sure that AWR is the right
tool to use for the problem you have, rather than some other tool
available to you. Which means that <i><b>you should always start with the problem you are trying to fix</b></i>,
and not dive straight into using some tool or other and hope that it
shows you the fault in the first screen's worth of output. You must be
clear on what the problem is that you are trying to fix, the impact of
that problem on users or the application or something else, and how you
would measure the improvement to show that the problem has been
resolved. If you don't know that information about the "problem", then
you can never be sure that you have actually fixed the real cause of it -
it may simply have disappeared for other reasons.<br />
<br />
I'd strongly
recommend following a top down approach to a performance problem
investigation - start with the real world problem that users are
experiencing, and work down and inwards from that towards the database.
Do not start at the bottom directly on the database, and hope to find
the problem quickly there. This is a "finding a needle in a haystack"
approach. You might get lucky and find something related to the problem
straight away, or you might not and end up wasting your time chasing
dead ends. The "performance problem" may actually be somewhere else
outside of the database, such as the client application itself, the
network, or any application servers. Check and eliminate those first.
Only after checking those should you move onto the database itself.<br />
<br />
An
AWR Report tells you what happened on your database between the two
snapshots used. It does not tell you whether any problems did or did
not occur, or what those problems might be. And it does not tell you
whether your database was well behaved or not. It just provides a
complete overview of everything that happened on that database between
those two snapshots, and it is up to you do decide whether this meets
expectations or not.<br />
<br />
AWR reports are database wide i.e. across
all the activity that happened on that database between the two
snapshots. This includes activity from all sessions on all schemas in
the database. So it is only really of use if the problem you are
investigating is a system wide one, affecting most of the sessions and
users of the database. And even then it is still not guaranteed to show
you what the cause of your problem is, as it depends on what your
problem is.<br />
<br />
Furthermore, an AWR Report only tells you the grand
totals of what happened over that period of time, and the averages of
those (some total value divided by the elapsed time or some other
measurement). It does not tell you the peak values, as it only knows
the absolute change in measurements between the two snapshots being
reported on. And average values over a period of time are always lower
than the peak values during the most active period, because that is what
an "average" means. How much lower the average values are than the
peak values depends on a number of factors, such as the different time
periods involved and the relative difference in activity between the
peak period and the other periods.<br />
<br />
A pitfall to avoid is just looking at the "<i>top 10</i>" type lists of SQL statements executed and wait events. <b><i>In every database there is always a slowest SQL statement or top wait event</i></b>,
regardless of how active the database is. Just because that was the
"slowest SQL statement" does not mean that it is a bad statement, or
that it is related in any way to the problem you are investigating. You
need to prove a link between the two, and not assume that they are
related in any way. Trying to "improve" a "slow SQL statement" that is
unrelated to the real problem users are experiencing, is just a waste of
time. Check that any potential cause is really related to your
problem, and not independent of it. <br />
<br />
Rather than wade through a series of full single AWR Reports for each pair of snapshots you can instead <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2016/10/awr-summary-data-extracts.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2016/10/awr-summary-data-extracts.html">Query the AWR data directly</a>
(again, providing you have paid for the Diagnostics Pack licence), and
pull out just the key performance measurements you are interested in
across a number of snapshots all at once. These might include Average
Active Sessions, number of SQL statements executed, and percentage of
time waiting. This can help performance investigations progress faster
if you can more quickly identify the peak activity period across all the
AWR snapshots available, and then drill down into only the most active
period of time.<br />
<br />
Remember that AWR provides database wide
information over relatively long periods of time (between AWR
snapshots), and that there are other tools out there for doing different
types of performance analysis of Oracle databases:<br />
<ul>
<li>Real time query and analysis of Oracle Dynamic Performance Views (<code>V$</code> views), such as <code>V$SESSION</code></li>
<li>Active Session History (ASH), which also requires the Diagnostics Pack license</li>
<li>SQL Trace to record all SQL statements executed by a session, and their elapsed times</li>
</ul>
Each
of these have their advantages and disadvantages, and if you want to be
an Oracle expert then you should be aware of these and their tradeoffs.
You need to make sure that you are using the right tool for the kind
of performance problem you are investigating.<br />
<br />
In conclusion then, I'd say that my two main points in this post are:<br />
<ul>
<li>Make sure you are clear on what the performance problem is that you are investigating</li>
<li>Check that AWR Reports are the right tool to be using to investigate this performance problem, rather than something else</li>
</ul>
<br />John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-73768720640997893342017-11-22T09:50:00.002+00:002017-11-22T09:50:37.163+00:00Chatty Applications and Simple SQLOne type of "poor performance" scenario I have come across a few times is due to what I call "<em>chatty applications</em>".
These are applications that execute a disproportionally high number of
what look like very simple SQL queries for every business transaction
they do. And often this is a deliberate design choice by the
application architects and developers, claiming that simpler SQL
statements on single tables using indexed columns always leads to
efficient execution plans, so it must be better than anything else.
Unfortunately it is the high volume of execution of such "simple SQL
queries" that is the cause of the poor performance these applications
often experience. Both in terms of poor response times for individual
users, and poor total throughput across all the users. Why then are
such "<em>chatty applications</em>" the cause of poor performance and not the solution?<br /><br />First,
there are network round trips needed for each SQL statement executed,
and these are typically milliseconds in terms of order of magnitude of
elapsed time. The SQL statement text is sent from the client to the
database server, and then the resultant data sent back from the server
to the client. And when there are hundreds of SQL statements being
executed per business transaction, then these all add up. Executing a
"business transaction" in such a "chatty application" can end up
spending more time transferring data back and forth over the network
than actually doing work with that data on either the client or database
server system.<br /><br />Second, the high SQL execution volume results in
high library cache usage and contention for access to it inside Oracle.
For each SQL statement executed, Oracle will check if that SQL is
already in the library cache (in the shared pool in the SGA) and if so
will go ahead and use the associated execution plan. Such library cache
look ups involve various temporary internal locks on the library cache,
to make sure it doesn't change while your session is examining it. And
the more SQL statements being executed leads to more library cache
lookups and more contention for the controlling locks on it.<br /><br />Third,
the volume of data transferred to the client and then processed on it
can be much higher than if a single, more complex SQL statement was used
instead. This is also about network delays, but this time it is due to
the amount of intermediate result data being transferred to the client,
as it works its way through data from different tables as part of its
"business transaction".<br /><br />Ultimately this all a variation on the
"row by row" processing mind set where all the raw data is pulled back
into the client, which further processes it such as joining together
data from multiple tables. And a "row set" based approach can be more
efficient - executing fewer but more complex SQL statements on the
database server, to send only the final resultant data back to the
client.<br /><br />I have seen this kind of scenario in action at a number
of customers, where it was their application design that was slowing
down and limiting the performance of their application. In some cases
we were able to identify the highest use cases in the application and
successfully modify it to use fewer, more complex SQL queries instead,
so that more work is done per execution by far fewer SQL statements. In
other cases it was not as clear due to the application developers using
a "glue layer" of third party software to handle the application to
database mappings and constructing the SQL queries to be executed in
each case. Fixing this requires reconfiguration of the "glue layer" to
issue more complex SQL queries involving multi-table joins. However, in
at least one case the customer did not at that time have enough
expertise in how to use this "glue layer" to achieve that. They had
just used it in an "out of the box" type fashion, and were now suffering
as a consequence of that.<br /><br />Probably the worst case I saw was an
application using another "glue layer" that only ever issued SQL queries
for a single column of data at a time of the form "<code>select one-column from table-name where pkid_col = value</code>", and consequently did multiple such SQL queries to get each individual column it needed.
And obviously this led to horrible performance and scalability.
Luckily the application had been written to use a well defined internal
API for data access, which was in turn wrapped around the "glue layer".
We were able to insert an extra layer between the application's API and
the "glue software", intercepting the application API data access calls
<em>before</em> the "glue software" was called. We
implemented a caching layer here, based on the primary key of a record
from each table, which was always used on each query. For a new record
query, with a different primary key value, the API now did a "<code>select *</code>"
into a local record buffer of the whole data record, and then returned
the one individual column requested. If the primary key requested
matched that already in the current record buffer, then the code just
returned the individual column value directly from that, avoiding any
further SQL execution and executing any code in the "glue layer".<br /><br />As
you might expect performance increased significantly as a result of
this change - replacing the multiple single column SQL queries with just
one whole record SQL query followed by in-memory, local accesses,
avoiding all those network round trips. Also the net load on the Oracle
database server decreased significantly, with less contention on the
library cache, and the CPU and disk utilisation decreased significantly
as well. Consequently the performance of all the other SQL statements
being executed on that database server improved as well due to those
resources being freed up. A definite "win-win" all around, by accessing
the data in the database in a much more efficient way.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-9849577378655760432017-11-16T10:02:00.000+00:002017-11-16T10:02:50.256+00:00Redundant Grandparent Foreign Keys and Cardinality Estimate ErrorsThis post is about how a slightly de-normalized database design
involving redundant foreign keys to other tables can end up producing
sub-optimal execution plans for queries that use those extra joins as
additional filter conditions.<br /><br />By default the Oracle Optimizer
assumes that different columns of data in a table are independent of
each other, and that their data values are not correlated with each
other in any way. When a query has filter conditions on multiple
columns in a table, the Optimizer will combine their filter factors
together assuming they are independent of each other i.e. unless it has
explicit information to the contrary. So if each filter factor was
estimated to match 1% of the rows in the table, then their combined
filter factor would be to match 0.01% of the rows in the table (1% of
1%).<br /><br />However, if the data in these columns is in fact correlated
then this filter factor estimate will be wrong. A classic example is
"month of birth" and "zodiac sign". Each has only 12 possible values,
and the Optimizer will assume that there are 144 possible combinations.
But in fact there are only 24 possible value combinations because each
zodiac sign straddles only two months. Assuming a uniform distribution
amongst these 24 possible values within the data rows in the table, then
each pair of values would match 1 / 24 or 4.17% of the rows in the
table. However, the Optimizer will assume that a pair of values would
match 1 / 144 or only 0.69% of the rows in the table. Such misestimates
can make the Optimizer choose a relatively inefficient access such as
using an index instead of a full table scan, due to estimating too few
rows to match the filters and a too low cost for the data access.<br /><br />Often
this kind of correlation between data values in columns in a table
occurs naturally, and there is little you can directly do about it. The
"month of birth" and "zodiac sign" is one such example. However, this
kind of correlation between columns can also occur as a result of how
the tables have been designed, and can result in sub-optimal execution
plans being produced.<br /><br />One scenario is adding a redundant column
to a table as a foreign key to an indirectly related table, such as a
"grandparent" table that is the parent of the table's direct parent. A
specific example would be something like putting the "customer / account
identifier" into the "order line item" table, as a redundant copy of
that in the "order" table, along with the "order identifier". Or
putting both the "product main category id" and "product sub-category
id" into the "product" table, when there is a hierarchy of categories. <br /><br />Again,
in these cases, the Optimizer will assume that such columns are
independent of each other, unless it has explicit information otherwise.
The danger now is that the Optimizer will produce <em><strong>different execution plans</strong></em> for what are essentially the <em><strong>same queries</strong></em> depending on whether the <em><strong>redundant joins</strong></em> are included or not. <br /><br />Lets
look at a solid example. I created three tables each with a parent /
child relationship between them, and more rows in the child tables than
the parent tables. I won't post the SQL DDL for these as it will make
the post too long, but their structure and contents should be
straightforward enough, and all values are uniformly distributed with an
equal number of child records per parent record. There are indexes on
the primary key columns, and the individual foreign key columns, but
nothing else. Primary key and foreign key constraints are explicitly
specified.<br /><br />The tables are Grandparent (1,000 rows), Parent
(100,000 rows), and Child (10,000,000 rows) i.e. 100:1 ratios, and each
has a primary key column named "pkid". The Parent table has a foreign
key column (gp_id) to Grandparent, while the Child table has a foreign
key to Parent (of p_id) and also a redundant foreign key to Grandparent
(of gp_id). <br /><br />I will now execute three different but equivalent
queries against these tables using different combinations of the
possible foreign key joins:<br />
<ol>
<li>Only direct joins from child to parent, and then parent to grandparent</li>
<li>As previous query plus additional (redundant) join from child to grandparent</li>
<li>Direct from child to grandparent only, and parent table omitted from query</li>
</ol>
Each
query has a filter condition on another column in the grandparent table
restricting the matching rows to just 5 rows in that table.<br /><br />I'm using Oracle 11gR2 on Oracle Linux:<br /><pre>SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
</pre>
<br />First query - direct joins between parent / child tables:<br /><pre>SQL> @test1
ROW_COUNT
----------
50000
SQL_ID fddwv6gp5m26h, child number 0
-------------------------------------
select sum (c.one) row_count
from child c, parent p, grandparent gp
where c.p_id = p.pkid
and p.gp_id = gp.pkid
and gp.pct05 = 1
Plan hash value: 4174412392
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 25668 (100)| |
| 1 | SORT AGGREGATE | | 1 | 25 | | |
|* 2 | HASH JOIN | | 49591 | 1210K| 25668 (1)| 00:05:09 |
|* 3 | HASH JOIN | | 500 | 8500 | 244 (1)| 00:00:03 |
|* 4 | TABLE ACCESS FULL| GRANDPARENT | 5 | 40 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| PARENT | 100K| 878K| 238 (1)| 00:00:03 |
| 6 | TABLE ACCESS FULL | CHILD | 10M| 76M| 25398 (1)| 00:05:05 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."P_ID"="P"."PKID")
3 - access("P"."GP_ID"="GP"."PKID")
4 - filter("GP"."PCT05"=1)
</pre>
We can see that it is estimating 5 rows to be retrieved from the
Grandparent table, 500 from the join to Parent, and about 50,000 from
the join to Child - operations 4, 3, and 2. This is correct given the
100:1 ratio between the rows in each table. The cost is estimated at
just over 25,000, which is really just the sum of the costs of the full
table scans involved.<br /><br />Second query - redundant join added between Child and Grandparent:<br /><pre>SQL> @test2
ROW_COUNT
----------
50000
SQL_ID 0a703yvda9z4h, child number 0
-------------------------------------
select sum (c.one) row_count
from child c, parent p, grandparent gp
where c.p_id = p.pkid
and p.gp_id = gp.pkid
and gp.pct05 = 1
and c.gp_id = gp.pkid
Plan hash value: 4140991566
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 12271 (100)| |
| 1 | SORT AGGREGATE | | 1 | 29 | | |
| 2 | NESTED LOOPS | | 50 | 1450 | 12271 (1)| 00:02:28 |
| 3 | NESTED LOOPS | | 49500 | 1450 | 12271 (1)| 00:02:28 |
|* 4 | HASH JOIN | | 500 | 8500 | 244 (1)| 00:00:03 |
|* 5 | TABLE ACCESS FULL | GRANDPARENT | 5 | 40 | 5 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | PARENT | 100K| 878K| 238 (1)| 00:00:03 |
| 7 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 8 | BITMAP AND | | | | | |
| 9 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 10 | INDEX RANGE SCAN | IX_CHILD_PID | 99 | | 2 (0)| 00:00:01 |
| 11 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 12 | INDEX RANGE SCAN | IX_CHILD_GPID | 99 | | 21 (0)| 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID | CHILD | 1 | 12 | 12271 (1)| 00:02:28 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."GP_ID"="GP"."PKID")
5 - filter("GP"."PCT05"=1)
10 - access("C"."P_ID"="P"."PKID")
12 - access("C"."GP_ID"="GP"."PKID")
</pre>
The only difference to the previous query is the addition of the extra filter condition of "<code>c.gp_id = gp.pkid</code>",
using the redundant join between Child and Grandparent. The row count
is the same - 50,000 - because the extra filter condition is redundant
and doesn't change the number of matching rows in any way. But the
execution plan is completely different, because the Optimizer has
assumed that the two filters on Child are independent of each other, but
this is not true.<br /><br />The execution plan starts the same, filtering
on Grandparent to 5 estimated rows, then joining to Parent to produce
500 estimated rows (operation 4 Hash Join). Now it does 2 Nested Loops -
the inner to get ROWID's from Child of matching rows, and the outer to
get the data row itself for the "<code>one</code>" column used in the
output. And this outermost Nested Loop (operation 2) is only estimated
to produce 50 rows, which is clearly incorrect.<br /><br />What has happened is that because the "<code>gp_id</code>"
column in Child has 1,000 distinct values in it, the Optimizer has
reduced the final row estimate by this ratio i.e. it is estimating 50
matching rows and not 50,000 matching rows. Based on this it has costed
the index based access to Child to come out at 12,271 which is lower
than the 25,000+ cost of the full table scan in the first execution
plan. <br /><br />This cost seems to be arrived at as the cost of index
access for one pair of values - 21 + 2 = 23 - plus one more for the
access to the data row in the table by ROWID i.e. 24 cost per Child row.
For the estimated 50 Child rows this gives a total cost of 12,000,
which with the costs so far to the Hash Join of 244 are very close to
the reported total cost of 12,271.<br /><br />In fact this execution plan
will take longer to execute because it will actually be processing
50,000 Child records and not just 50, and the execution plan of the
first query would actually be the better one. But it is the assumption
that these two foreign key columns are independent of each other that
has led the Optimizer to produce this sub-optimal plan.<br /><br />Third query - remove Parent table completely from query, and join directly from Child to Grandparent:<br /><pre>SQL> @test4
ROW_COUNT
----------
50000
SQL_ID 76tptvjkbx08x, child number 0
-------------------------------------
select sum (c.one) row_count
from child c, grandparent gp
where c.gp_id = gp.pkid
and gp.pct05 = 1
Plan hash value: 2796906588
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 25435 (100)| |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
|* 2 | HASH JOIN | | 50000 | 732K| 25435 (1)| 00:05:06 |
|* 3 | TABLE ACCESS FULL| GRANDPARENT | 5 | 40 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| CHILD | 10M| 66M| 25403 (1)| 00:05:05 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."GP_ID"="GP"."PKID")
3 - filter("GP"."PCT05"=1)
</pre>
The result is the same as before - 50,000 rows - and the execution
plan is basically the first one with Parent removed - two full table
scans and a Hash Join. With only one filter condition on the Child
table the Optimizer has correctly estimated the number of matching rows -
50,000 - and produced the appropriate execution plan.<br />
<br />
<h2>
Conclusion</h2>
Beware
of database designs that include redundant foreign key columns between
child and grandparent tables, or across even more levels of a
relationship hierarchy, and how you write queries involving such tables.
While it might seem somehow better to include all known joins between
all the tables in a query, this can cause the Optimizer to misestimate
the number of matching rows from a table, and to choose non-optimal
access methods as a result. Rather than helping the Optimizer, adding
such redundant join conditions to queries actually makes things worse.
As a general rule you are better off only having joins in a query
between directly related tables, and not to indirectly related ones.
Otherwise, eliminate the middle, intermediate tables if possible and
join directly between the relevant tables, which should produce the same
results.<br /><br />I saw this happen recently at a client, and removing
such a redundant join condition led to the Optimizer producing more
realistic row estimates and a much better execution plan, with a
corresponding faster execution given the data volumes involved.<br /><br />The
other approach to dealing with this kind of scenario is to provide
extra information to the Optimizer so that it knows when two or more
columns are correlated. This can be done simply by creating an index on
just those columns in the table, or by creating extended statistics on
the column group. As long as this extra information is present, then
the Optimizer will produce a better row estimate and a better execution
plan. When I tested this with such an index on the Child table on both "<code>p_id</code>" and "<code>gp_id</code>"
together, the second query with the redundant join the execution plan
went back to the original one - 3 full table scans and 2 hash joins.<br /><br />Also
be aware that Oracle is always trying to make the Optimizer more
"intelligent" so the behaviour and execution plans produced can change
between versions of Oracle for the same query on the same database
tables. I briefly tested these same queries on Oracle 12c and got
completely different execution plans. In 12c I actually got an Adaptive
Execution Plan which combined both plan variations under a parent
Statistics Collector operation. Depending on how many matching rows
were initially retrieved it could either do a full table scan on Parent
and Child with Hash Joins, or use the indexes on the foreign key columns
to do direct row lookups in Nested Loops. Combined with other feedback
mechanisms in the 12c Optimizer, it is possible for the Optimizer to
now detect such non-optimal execution plans and produce better ones on
the next execution of the same query.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-2140441607405951492017-10-31T10:59:00.003+00:002017-10-31T10:59:47.881+00:00I'm back, again, late 2017Nothing much to say other then I'm back again, and hope to do some more
blog posting soon. My excuse for the lack of posts is that I've been
busy helping a client with a large data migration project. That is now
over, with all the data successfully extracted for loading into their
new system, so I've got more time available to properly write up some
technical Oracle or performance posts.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-39111572964114585532016-12-29T15:57:00.003+00:002016-12-29T16:01:02.647+00:00Fixing Popular Posts Margin on BloggerI use the <a class="externalLink" href="https://www.blogger.com/" target="_blank" title="External link to https://www.blogger.com">Blogger</a> platform for this blog, and I recently added the "<i>Popular Posts</i>"
widget to the sidebar. Unfortunately it did not display correctly,
with the first character or two of each blog post title being lost and
chopped off, as if the whole thing had been shifted to the left for some
reason. Here is how I fixed it to display properly.<br />
<br />
After a lot
of reading up on HTML and CSS I realised that these posts appeared
within a section with an associated style name, in this case a class.
In turn I could add an entry to the HTML template for my blog to define a
shift to the right for this particular class style so that it would
display properly. And this shift would only be applied to this section
of the blog page and no other.<br />
<br />
What this boils down to is the following:<br />
<ul>
<li>The "Popular Posts" widget reference in the HTML has an attribute of "<code>id='PopularPosts1'</code>"</li>
<li>The list of posts within this widget has two class attributes, one of which is "<code>popular-posts</code>"</li>
<li>I could shift the list of posts far enough across to the right using the style property of "<code>padding-left</code>"</li>
</ul>
To apply this to your Blogger blog template do the following:<br />
<ul>
<li>In Blogger click on "Template" on the left hand side of your blog dashboard (under Layout and above Settings)</li>
<li>Click the on "Edit HTML" button under the small image of your blog </li>
<li>Expand the "<pre><code><b:skin></b:skin></code></pre>" section near the top by clicking on it or on the right pointing triangle on the left hand side<ul>
<li>This was at line 7 on my template</li>
</ul>
</li>
<li>Scroll down to the end of this section, which finishes with </li>
</ul>
<pre class="source-code">]]></b:skin></pre>
<ul>
<li>Above this on their own lines add the following:</li>
</ul>
<pre class="source-code">#PopularPosts1 .popular-posts {
padding-left: 15px
}
</pre>
<ul>
<li>Then click on the "Save Template" button at the top of the screen</li>
<li>If
you now redisplay your blog you should see the "Popular Posts" list has
now been shifted over to the right and aligned nicely under the
"Popular Posts" heading</li>
</ul>
Note the following:<br />
<ul>
<li>The first line states that this style is only to be used within a "popular-posts" class element that itself occurs within a "PopularPosts1" ID element.<ul>
<li>This should match exactly the content section we want to be shifted across in the blog page</li>
</ul>
</li>
<li>The second line sets the left hand side padding before the displayed content to be "<code>15px</code>" (pixels)<ul>
<li>The value of "<code>15px</code>" was obtained by simple trial and error starting with smaller values until the list was shifted over enough.</li>
</ul>
</li>
</ul>
I'm
not saying my solution is perfect or absolutely correct in any way, I'm
just saying that it works for me, and it seems to conform to the
various ways CSS works and how Blogger defines its page sections. It
would seem that all of the different sidebar elements have a right shift
built into them, except for Popular Posts of just titles (no snippets
or thumbnails). This solution adds in such a right shift so the blog
post titles are now all visible in the blog page.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-55507372298567889012016-12-12T13:02:00.000+00:002016-12-12T13:02:19.877+00:00Announcing Bottleneck Data SolutionsSPOILER ALERT: This particular blog post is a blatant self promotion for me and the various Oracle database services I am offering.<br /><br />I
may not have mentioned it explicitly before but I have left the
corporate world of permanent employment and have gone independent to
offer my services direct to clients (at reasonable rates, of course).
I'm doing this through my newly created company <a class="externalLink" href="http://bottleneckdatasolutions.com/index.html" target="_blank" title="External link to http://bottleneckdatasolutions.com/index.html">Bottleneck Data Solutions</a>, which has its own associated, obligatory web site.<br /><br />Basically
I am offering my many years of knowledge, experience and expertise on
Oracle database design and performance tuning directly to clients for
short term consulting engagements or longer term contracts. I'm not big
on doing the self promotion thing unfortunately, but I won't succeed
unless I get the message out and publicise myself one way or another.
So this is a one-off post to raise awareness of what I'm offering
through <a class="externalLink" href="http://bottleneckdatasolutions.com/index.html" target="_blank" title="External link to http://bottleneckdatasolutions.com/index.html">Bottleneck Data Solutions</a> and make you aware that I'm available for Oracle database performance tuning, database design and development engagements.<br /><br />I have also uploaded some of the <a class="externalLink" href="http://bottleneckdatasolutions.com/resources.html" target="_blank" title="External link to http://bottleneckdatasolutions.com/resources.html">SQL Scripts</a>
that I commonly use, to share with everyone. Nothing revolutionary I
am afraid, just the standard "free space" and "system activity summary"
type reports.<br /><br />At the moment I am offering to do a free, initial <a class="externalLink" href="http://bottleneckdatasolutions.com/awr-review.html" target="_blank" title="External link to http://bottleneckdatasolutions.com/awr-review.html">Performance Review of an AWR Report</a>. If I can spot anything obvious affecting performance in the AWR Report then I'll provide feedback to you. <br /><br />Why "<em><strong>Bottleneck Data Solutions</strong></em>"?
Well data is everywhere, and is the lifeblood of the modern company.
But I have seen a significant number of cases where slow application
performance was caused by a badly designed and implemented database. In
other words, the way the data was organized and stored in the database
was becoming the bottleneck, limiting how quickly data could be supplied
to the application software. Solving this kind of bottleneck requires a
mix of short term performance tuning to deal with the most obvious hot
spots, and a longer term review of the database and redesigning the most
critical parts.<br /><br />Thank you for your patience. The normal service of technical blog posts will be resumed again in the future.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-90944199189909492932016-12-01T16:37:00.000+00:002016-12-02T14:59:07.169+00:00When 2 Queries Are Better Than 1A general rule of thumb with queries in Oracle is to use just one query
when you can and leave it to the Optimizer to work out the best possible
execution plan i.e. decomposing one query into multiple separate
queries can end up making the database do more work than it needs to. I
came across a case the past week where the opposite was true - making
Oracle do separate "smaller" queries was far quicker than using one
overall query.<br />
<br />
<h2>
Background</h2>
Imagine that we have two tables -
new data arrives in one table (INCOMING) and we process it in some way
and insert the resultant data into another table (OUTGOING). Data
records in both tables have a timestamp on them for when they were
created, and they are inserted in timestamp order. We want to process
new messages at regular intervals adding them to the OUTGOING table, but
they are not deleted from the INCOMING table when processed. Instead a
separate purge job runs each night to delete incoming data older than
48 hours. In this scenario we can use the timestamp on each record to
work out the maximum timestamp of the OUTGOING data records we have
processed so far, and only retrieve INCOMING data records that have been
added since then.<br />
<pre class="source-code"><code>select max (ts) from outgoing ;
</code></pre>
An additional complexity in this is that the INCOMING data records
fall into different categories or types, and these are processed
separately to each other due to different processing rules and
requirements. This means that when getting the maximum timestamp from
the OUTGOING data set we also need to restrict on the category.<br />
<pre class="source-code"><code>select max (ts) from outgoing
where category = 'CATEGORY1' ;
</code></pre>
But it also turns out that some categories are similar enough to
each other that they can be processed together at the same time from
INCOMING, which should be more efficient that executing multiple
separate queries against what can be a large input table with historical
data in it. This means we now need the maximum timestamp from OUTGOING
across a number of categories. This is most obviously done with this
query.<br />
<pre class="source-code"><code>select max (ts) from outgoing
where category in ( 'CATEGORY1' , 'CATEGORY2', 'CATEGORY3' ) ;
</code></pre>
This is valid because these 3 categories are all extracted from
INCOMING at the same time and processed together, so we do want the
maximum timestamp across all of them together.<br />
<br />
<h2>
The Problem</h2>
When executed this "<code>SELECT MAX (TS) WHERE CATEGORY IN (...)</code>"
query took a long time to execute (tens of seconds). Previous tests
had shown that Oracle can efficiently use an index to get a minimum or
maximum value from a table (sub-second), so why was this particular
query taking so much longer?<br /><br />The conclusion was that a single
category maximum or minimum is executed differently to one on a group of
categories, and this was the cause of the longer execution time.
Furthermore, rewriting the query as a union of separate single category
queries was much faster (over 100 times faster in this case!)
and almost as fast as the original single
category query.<br />
<br />
<h2>
Test Environment</h2>
This happened for real on
Oracle 12c, on a non-CDB database. I'm going to replicate it on a
separate test environment, also on Oracle 12c on a non-CDB database,
running on Oracle Linux 7.<br />
<pre class="source-code"><code>SQL> select * from v$version ;
BANNER CON_ID
---------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0
PL/SQL Release 12.1.0.2.0 - Production 0
CORE 12.1.0.2.0 Production 0
TNS for Linux: Version 12.1.0.2.0 - Production 0
NLSRTL Version 12.1.0.2.0 - Production 0
</code></pre>
I have taken a copy of the data in <code>DBA_OBJECTS</code> which
in my test instance had over 90,000 rows in it occupying 12 MB of disk
storage, and then doubled this 9 times to be 512 times larger than the
original data set, giving over 45 million rows using over 5 GB of disk
storage.<br /><br />Initially I thought about using the <code>CREATED</code> column for the timestamp, and the <code>OBJECT_TYPE</code>
for the category, but this ran into problems. First is that the
doubling the rows each time does not change any data values, which means
that we have very few different, distinct values for <code>CREATED</code>
and each value repeats many times. This has an impact on calculations
the Optimizer makes and the costs it estimates. Also the real data type
in the real system was <code>TIMESTAMP</code> and not <code>DATE</code>. So I decided to add an extra column <code>TS</code>
to the test table, and to update this later on to be a near-unique
value. By giving the column an initial value it means that space is
allocated within each record in each block in Oracle, and the later
update should not cause any more space to be allocated or rows to be
moved between blocks.<br /><br />My test system has only 2 GB for UNDO,
which means that I cannot do an UPDATE on a table larger than this. So
what I have done is double the table 7 times to just over 1 GB in size,
do the UPDATE on the TS to a near-unique random value, then double the
table another 2 times. This means that each <code>TS</code> value occurs 4 times, which I hope will still be good enough for the tests I am doing.<br /><br />In
terms of the spread of values, in the real system the incoming data is
purged every day, deleting data records older than 48 hours. The UPDATE
is using the current system timestamp value, offset by up to 48 hours
as a value in seconds.<br />
<pre class="source-code"><code>create table outgoing
tablespace testdata
as select * from dba_objects ;
--
alter table outgoing modify (owner not null, object_name not null,
object_id not null, object_type not null, created not null) ;
--
-- Try and make the data loading go faster by minimising redo logging
alter table outgoing nologging ;
--
alter table outgoing add (ts timestamp default systimestamp not null) ;
--
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
--
update outgoing
set ts = systimestamp - numtodsinterval (dbms_random.value (0, 60*60*24*2), 'second') ;
commit ;
--
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
--
select count (*) cnt , count (distinct ts) ts_cnt from outgoing ;
--
select bytes / (1024 * 1024) mb
from user_segments
where segment_name = 'OUTGOING' ;
</code></pre>
This produced 46,384,640 rows occupying 6.3 GB of disk storage. I
also checked the minimum and maximum values of the timestamp column,
and they did cover the past 48 hours (29-NOV-16 14.56.36.872969 to 01-DEC-16 14.56.36.863852).<br />
<br />
<h2>
Test Queries & Execution Plans</h2>
In
the real system there are less than 10 categories, and the processing
involves picking some of the ones that occur the most frequently. I'll
use the <code>OBJECT_TYPE</code> for the category, and pick the top 3 most frequently occurring values for the test queries:<br />
<pre class="source-code"><code>select object_type, count (*)
from outgoing
group by object_type order by 2, 1 ;
OBJECT_TYPE COUNT(*)
----------------------- ----------
...
TABLE 1217536
TYPE 1296384
INDEX 2208256
VIEW 3252224
JAVA CLASS 15777792
SYNONYM 18917376
</code></pre>
So our test categories will be the bottom 3.<br /><br />Let's create an index on the category and timestamp columns, again just like the real system:<br />
<pre class="source-code"><code>create index ix_outgoing on outgoing (object_type, ts)
tablespace testdata ;
</code></pre>
And let's make sure our statistics on this table are up to date:<br />
<pre class="source-code"><code>exec dbms_stats.gather_table_stats ('JOHN', 'OUTGOING')
</code></pre>
When I ran these test queries and then displayed the execution
plans the costs shown seemed wrong for the second query of multiple
categories. The reported cost was too low given the much higher elapsed
time and what I knew it was doing during the execution. To show the
real elapsed time I have done "<code>set timing on</code>" in SQL*Plus, and I have run the queries with the "<code>gather_plan_statistics</code>"
hint, and used extra options to get the run time execution statistics
for the query execution. This will show more about what really happened
during the execution.<br /><br />And to try and make things more even and
comparable between the queries I also flushed the buffer cache and
shared pool between the execution of each query (done from another
SQL*Plus session connected as SYSDBA).<br /><br />A single maximum for one query has the following execution plan:<br />
<pre class="source-code"><code>SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing where object_type = 'SYNONYM' ;
MAX(TS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635
Elapsed: 00:00:00.14
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));
SQL_ID 5mxhp3mh3gzmj, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (ts) from outgoing
where object_type = 'SYNONYM'
Plan hash value: 856418741
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 1 |00:00:00.01 | 4 | 4 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:00.01 | 4 | 4 |
| 2 | FIRST ROW | | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 4 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 4 |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='SYNONYM')
</code></pre>
<br />The multi-category query has the following execution plan:<br />
<pre class="source-code"><code>SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing
where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' ) ;
MAX(TS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635
Elapsed: 00:00:36.98
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));
SQL_ID anb3s44frw6xa, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (ts) from outgoing
where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' )
Plan hash value: 2754621723
-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| | 1 |00:00:36.87 | 169K| 169K|
| 1 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:36.87 | 169K| 169K|
| 2 | INLIST ITERATOR | | 1 | | | | 37M|00:02:23.89 | 169K| 169K|
|* 3 | INDEX RANGE SCAN| IX_OUTGOING | 3 | 1 | 4 (0)| 00:00:01 | 37M|00:01:02.14 | 169K| 169K|
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("OBJECT_TYPE"='JAVA CLASS' OR "OBJECT_TYPE"='SYNONYM' OR "OBJECT_TYPE"='VIEW'))
</code></pre>
<br />The rephrased query using 3 separate per-category queries is:<br />
<pre class="source-code"><code>SQL> select /*+ gather_plan_statistics */ max (maxts) from (
select max (ts) maxts from outgoing where object_type = 'SYNONYM'
union all
select max (ts) maxts from outgoing where object_type = 'JAVA CLASS'
union all
select max (ts) maxts from outgoing where object_type = 'VIEW'
) ;
MAX(MAXTS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635
Elapsed: 00:00:00.15
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));
SQL_ID 1yv5abfvs46y7, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (maxts) from (
select max (ts) maxts from outgoing where object_type = 'SYNONYM'
union all select max (ts) maxts from outgoing where object_type = 'JAVA CLASS'
union all select max (ts) maxts from outgoing where object_type = 'VIEW' )
Plan hash value: 2647832233
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost(%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 12(100)| | 1 |00:00:00.04 | 12 | 9 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:00.04 | 12 | 9 |
| 2 | VIEW | | 1 | 3 | 12 (0)| 00:00:01 | 3 |00:00:00.04 | 12 | 9 |
| 3 | UNION-ALL | | 1 | | | | 3 |00:00:00.04 | 12 | 9 |
| 4 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:00.01 | 4 | 4 |
| 5 | FIRST ROW | | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 4 |
|* 6 | INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 4 |
| 7 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:00.01 | 4 | 3 |
| 8 | FIRST ROW | | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 3 |
|* 9 | INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 4 | 3 |
| 10 | SORT AGGREGATE | | 1 | 1 | | | 1 |00:00:00.02 | 4 | 2 |
| 11 | FIRST ROW | | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.02 | 4 | 2 |
|* 12 | INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING | 1 | 1 | 4 (0)| 00:00:01 | 1 |00:00:00.02 | 4 | 2 |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("OBJECT_TYPE"='SYNONYM')
9 - access("OBJECT_TYPE"='JAVA CLASS')
12 - access("OBJECT_TYPE"='VIEW')
</code></pre>
<br />
<h2>
Conclusions</h2>
The single category query executes in under 1 second (0.14 seconds on my test system), while the multiple category query using "<code>IN</code>" takes over 30 seconds (36.98 seconds) which is much slower (over 100 times slower for elapsed time).<br /><br />We can see that the single category query uses an optimized index access to just get the maximum timestamp value - "<code>INDEX RANGE SCAN (MIN/MAX)</code>" combined with a "<code>FIRST ROW</code>" operation in the execution plan. However the multiple category query using "<code>IN</code>" does not do this and instead does an "<code>INDEX RANGE SCAN</code>" combined with a "<code>INLIST ITERATOR</code>" which takes far longer to execute (as seen from the A-Time value in the reported execution plan and the 37M for A-Rows).<br /><br />The solution is to rewrite the query as a "<code>UNION ALL</code>" of 3 separate single category queries. We can safely use "<code>UNION ALL</code>"
because the categories are distinct from each other. This query
executes in under a second which is almost as fast as for just one
category, and uses the optimized "<code>INDEX RANGE SCAN (MIN/MAX)</code>" access method combined with the "<code>FIRST ROW</code>" operation.<br />
<br />
Sometimes it can be quicker to run separate <code>SELECT</code> queries and merge the results together than to run one single query that does it all at once. Sometimes it can even be 100 times faster to run separate queries than one single merged query.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-86907223391730469602016-10-20T11:57:00.000+01:002016-11-25T14:51:10.980+00:00AWR Summary Data ExtractsA long time ago (in a galaxy far away) I wrote a series of blog posts
about directly extracting useful sets of data from the AWR snapshots in
the <code>DBA_HIST</code> tables for subsequent analysis and graphing
using things like Excel. This post is to summarise where I got to on
this, and provide links back to the main posts for future reference.<br /><br />A
guiding principle of my approach was that I wanted to avoid the "mother
of all AWR queries" that would extract every possible data value from
all <code>DBA_HIST</code> tables for all snapshots, as it would be a
really big and complex query (meaning difficult to debug if a mistake
was introduced anywhere) and it could use up a lot of system resources
and take some time to finish each time it was executed. Instead I
prefer the smaller, targeted type of query that extracts just what you
need from the minimum number of AWR tables, because it is simpler to
understand and will be quicker to execute. That said, you could extend
my approach of one sub-query per <code>DBA_HIST</code> table and include
all of the main AWR snapshot tables into one single query, and then
create a view using it. This would allow you to write simpler analysis
queries later on against the view, without needing to need to know the
details of all the various sub-queries and <code>DBA_HIST</code> tables.<br /><br />Along the way I covered the following main topics in the blog posts:-<br />
<ul>
<li><a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2013/08/summary-reporting-on-awr-data-1.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2013/08/summary-reporting-on-awr-data-1.html">AWR Snapshots themselves and DB Time Model data per snapshot</a><ul>
<li>Introduction and how to get the change in values between snapshots</li>
</ul>
</li>
<li><a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2013/09/awr-summary-reporting-3-waits.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2013/09/awr-summary-reporting-3-waits.html">Wait Times broken down by class</a> - I/O, Network, Cluster, etc.</li>
<li><a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2013/10/awr-summary-reporting-4-more-waits.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2013/10/awr-summary-reporting-4-more-waits.html">Individual wait times by wait event name</a>, allowing specific waits to be reported<ul>
<li>e.g. "db file sequential read", "log file sync"</li>
</ul>
</li>
<li><a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2013/10/awr-summary-reporting-5-system.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2013/10/awr-summary-reporting-5-system.html">System statistics (across all sessions)</a><ul>
<li>e.g. "execute count", "user calls", "physical reads", "physical writes"</li>
</ul>
</li>
</ul>
Although I did not post a query that merged all the different component sub-queries together, I did post a minimal <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2013/11/basic-performance-analysis-using-awr.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2013/11/basic-performance-analysis-using-awr.html">Summary AWR Query</a>
at the end to show how you could use the basic AWR data in practise to
look at the overall performance and behaviour of a system. By extending
the main <code>SELECT</code> of the final query itself, you can add
extra columns of data you want into the output, without any changes to
the various component sub-queries (the "<code>with</code>" or Common
Table Expressions). To put it another way, if there is some extra AWR
data you want extracted then you need to work out which of the main data
sets it falls under, and then which columns or rows in that AWR data
set the data you want is stored in.<br /><br />For instance here is another
variation of the AWR Summary Query that also extracts the number of SQL
statements executed, the separate wait times for User I/O and Cluster
waits (normalised to a percentage of the total wait time in the
snapshot), physical reads (normalised to a per second value), average
disk read and redo write times.<br /><pre class="source-code"><code>/**
* Name : awr_sum_lite
* Purpose : Minimal one line summary of each AWR snapshot from a day
* Descrip'n : Based on awr_sum_range with unneeded stuff deleted
* Only key essential stuff is output:
* SQL executions / sec, #User calls / sec
* Average Active Sessions, Wait % of Busy time
* Cluster Wait % of wait time, User I/O % of wait time
* Avg disk read time, Avg redo write time, #Disk reads / sec
* Values directly calculated, rather than raw underlying values
* Usage : @awr_sum_lite
* Assumption: Want the last 4 whole days
* Which is from midnight 4 days ago to last midnight
* To Do :
*/
--
set feedback off
set heading off
set newpage none
set verify off
--
set linesize 1000
set pages 28
set trimout on
set trimspool on
--
select ' ' from dual ;
select ' AWR Lite Summary Report' from dual ;
select ' =======================' from dual ;
-- select ' ' from dual ;
-- select '<snap>'
-- || '<= Database Time =>'
-- || '<= System Statistics ==>'
-- || '<== Waits ================>'
-- from dual ;
set newpage 1
set heading on
--
col snap_time format a15 heading 'SNAP TIME'
col aas format 90.0 heading 'AAS'
col wait_pct format 990.0 heading 'WAIT%'
col cluster_pct format 90.0 heading 'CLUS%'
col user_io_pct format 90.0 heading 'UIO%'
col executions_sec format 9,990 heading 'EXEC/S'
col user_calls_sec format 9,990 heading 'UCALL/S'
col physical_reads_sec format 9,990 heading 'READS/S'
col avg_disk_read format 990.0 heading '(MS)|AV RD'
col avg_redo_write format 990.0 heading '(MS)|REDO W'
--
--
with
snaps as
(select snap_id
, dbid
, end_snap_time
, snap_interval
, extract (second from snap_interval)
+ (extract (minute from snap_interval)
+ (extract (hour from snap_interval)
+ (extract (day from snap_interval) * 24)
) * 60
) * 60 snap_duration
from (select csnaps.snap_id
, csnaps.dbid
, min (csnaps.end_interval_time) end_snap_time
, min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
from dba_hist_snapshot csnaps
group by csnaps.snap_id, csnaps.dbid
)
) -- snaps
, systimes as
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
, systime.dbid
, systime.stat_name
, sum (systime.value - psystime.value) value
from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
where systime.snap_id = psystime.snap_id + 1
and systime.dbid = psystime.dbid
and systime.instance_number = psystime.instance_number
and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
, sysstats as
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
, sysstat.dbid
, sysstat.stat_name
, sum (sysstat.value - psysstat.value) value
from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
where sysstat.snap_id = psysstat.snap_id + 1
and sysstat.dbid = psysstat.dbid
and sysstat.instance_number = psysstat.instance_number
and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats
, syswaits as
-- One row for total wait time, plus break down into major wait classes, and events
(select syswaitevents.snap_id
, syswaitevents.dbid
, sum (syswaitevents.time_waited_micro) all_wait_time
, sum (decode (syswaitevents.wait_class, 'Commit', syswaitevents.time_waited_micro, 0)) commit_time
, sum (decode (syswaitevents.wait_class, 'Cluster', syswaitevents.time_waited_micro, 0)) cluster_time
, sum (decode (syswaitevents.wait_class, 'Concurrency', syswaitevents.time_waited_micro, 0)) concurrency_time
, sum (decode (syswaitevents.wait_class, 'Network', syswaitevents.time_waited_micro, 0)) network_time
, sum (decode (syswaitevents.wait_class, 'System I/O', syswaitevents.time_waited_micro, 0)) system_io_time
, sum (decode (syswaitevents.wait_class, 'User I/O', syswaitevents.time_waited_micro, 0)) user_io_time
from
(select sysevent.snap_id
, sysevent.dbid
, sysevent.wait_class
, sysevent.event_name
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
, sum (sysevent.total_waits - psysevent.total_waits) wait_count
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
and sysevent.wait_class != 'Idle' -- Ignore Idle wait events
group by sysevent.snap_id
, sysevent.dbid
, sysevent.wait_class
, sysevent.event_name
) syswaitevents
group by syswaitevents.snap_id
, syswaitevents.dbid
) -- syswaits
-- Average Active Sessions, Wait % of Busy time
-- Cluster Wait % of wait time, User I/O % of wait time
-- SQL executions / sec, #User calls / sec
-- Avg disk read time, Avg redo write time, #Disk reads / sec
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
, (user_calls_st.value / snaps.snap_duration) user_calls_sec
, (execs.value / snaps.snap_duration) executions_sec
, (dbtime.value / 1000000) / snaps.snap_duration aas
-- If database active time is 1% of duration time or less ignore wait (0)
, case when (dbtime.value / (1000 * snaps.snap_duration) ) > 1
then (100 * syswaits.all_wait_time / dbtime.value)
else 0.0
end wait_pct
, (100 * syswaits.cluster_time / syswaits.all_wait_time) cluster_pct
, (100 * syswaits.user_io_time / syswaits.all_wait_time) user_io_pct
, (phys_reads.value / snaps.snap_duration) physical_reads_sec
, (syswaits.user_io_time / phys_reads.value) / 1000 avg_disk_read
, (redo_time_st.value * 10 / redo_write_st.value) avg_redo_write
from snaps
join (select * from systimes where stat_name = 'DB time') dbtime
on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
join syswaits
on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
join (select * from sysstats where stat_name = 'execute count') execs
on snaps.snap_id = execs.snap_id and snaps.dbid = execs.dbid
join (select * from sysstats where stat_name = 'user calls') user_calls_st
on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
join (select * from sysstats where stat_name = 'redo writes') redo_write_st
on snaps.snap_id = redo_write_st.snap_id and snaps.dbid = redo_write_st.dbid
join (select * from sysstats where stat_name = 'redo write time') redo_time_st
on snaps.snap_id = redo_time_st.snap_id and snaps.dbid = redo_time_st.dbid
join (select * from sysstats where stat_name = 'physical reads') phys_reads
on snaps.snap_id = phys_reads.snap_id and snaps.dbid = phys_reads.dbid
where snaps.end_snap_time between
(trunc (sysdate) - 4) and (trunc (sysdate))
order by snaps.end_snap_time
/
--
set feedback on
set lines 80
</snap></code></pre>
This query provides what I believe are a good, minimal set of
useful measurements of activity and relative performance of an Oracle
database. It allows easy viewing of relative activity (Average Active
Sessions, User Calls / sec), waiting time as a percentage of database
time, the main types of wait occurring (User I/O and Cluster), and some
key disk I/O measurements (physical reads / sec, average disk read time,
average redo write time). From these you can see how active the system
is, whether any significant waits are occurring, and if they are I/O
related, and see how these change over time between AWR snapshots.<br /><br />Of
course everyone has a different opinion of what the most important
measurements are on an Oracle database. My key message here is that the
approach I have followed lets you customise these queries to report out
the performance and activity measurements you want from the data
captured in the AWR snapshots. So if you want a different set of data
values reported out, then it should be relatively easy to modify these
queries to get what you want.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-67994205252372417862016-03-30T11:55:00.000+01:002016-03-30T11:55:26.707+01:00Full Table Scan not always as low as 0.5% of data!Based on a <a class="externalLink" href="https://community.oracle.com/message/13756179#13756179" target="_blank" title="External link to https://community.oracle.com/message/13756179#13756179">reply from Jonathan Lewis</a> to an <a class="externalLink" href="https://community.oracle.com/message/13755118" target="_blank" title="External link to https://community.oracle.com/message/13755118">OTN post on Explain Plans</a> I need to correct some of the claims made in my <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2016/03/full-table-scan-friend-or-foe.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2016/03/full-table-scan-friend-or-foe.html">previous post on Full Table Scans</a>.<br /><br />I'm
going to repeat the inaccurate paragraphs and the Conclusions I posted
then [edited here for brevity], and then point out the mistakes in them,
and an attempt at a corrected set of paragraphs. I will also edit the
original post and replace these incorrect paragraphs with the corrected
ones.<br /><br />I'll highlight the parts that are not strictly correct.<br />
<blockquote>
<em>The
main determinant for whether a FTS is the best access method is the
fraction or percentage of the rows in the table being retrieved by the
query</em>, which are of course determined by the filter conditions in the query. <em>If
this fraction / percentage of data from the table is high enough then a
Full Table Scan will always be a lower cost than using an Index Scan</em> (except maybe for an Index Fast Full Scan, which is really another variation on a Full Scan).<br /><br />So when you see a Full Table Scan in an execution plan you should check the query itself for <em>how many rows it needs from the total number of rows in the table i.e. what percentage or fraction. If this is high enough</em>
then a FTS is indeed the "best" access method to get that data because
it has the lowest cost. In which case you should really be asking why
the query needs such a high percentage of the data from one table. Don't
assume that the Optimizer is wrong - normally it isn't. It is just as
likely to be your query that is affecting the execution plan chosen.<br /><br /><h2>
Conclusions</h2>
Although
a Full Table Scan can seem a "brute force" approach to finding some
matching records in a table, it can sometimes be the better way of doing
it though. <em>It all depends on how many rows you want back from the table as a percentage of the rows in the table.</em><br /><br /><em>If
your query is retrieving more than about 0.20% to 0.35% of the data in a
table then a Full Table Scan may well be the cheapest and best access
method.</em> [...]. Trying to force the Optimizer into using another
access method in this circumstance is a waste of time, because all other
access methods will be more expensive. The only exception might be
another Full Scan type access, such as an Index Full Scan. But even then
the gains (reduction in cost) will only be marginal i.e. not a full
order of magnitude less.<br /><br />When you see a FTS in a query execution plan <em>you should check the estimated row count and what this is as a percentage of the row count in the table</em>,
and confirm if this estimate is correct or not. If the estimate is
correct then a FTS is the lowest cost access method and the Optimizer is
right to choose it. You should also check if your query is correct, or
if there is something wrong with the filter conditions in it.</blockquote>
The
main error is that it is wrong for me to claim any kind of actual
figures for the percentage when the cutover will occur. As Jonathan
points out in his reply on OTN it is possible for an index access to
have a lower cost than a FTS for 24.5% of the data in a table, which is
far above the 0.35% figure I mention. The actual cutover percentage is
dependent on so many variables that it is incorrect for me to try and
state a specific range for the cutover.<br />
<br />While I was not
necessarily wrong in my specific examples because I made clear what my assumptions were, I was
also not right in all possible scenarios. So I'm withdrawing my
original Conclusions and updating them to be more strictly correct.<br /><blockquote>
[paragraph deleted]<br /><br />So when you see a Full Table Scan in an execution plan you should check the query <em>and all other relevant factors to see if</em>
a FTS is indeed the "best" access method to get that data because it
has the lowest cost. [deleted sentence] Don't assume that the
Optimizer is wrong - normally it isn't. It is just as likely to be your
query that is affecting the execution plan chosen.<br /><br /><h2>
Conclusions</h2>
Although
a Full Table Scan can seem a "brute force" approach to finding some
matching records in a table, it can sometimes be the better way of doing
it though. <em>It depends on several factors including how many rows
you want back from the table, the number of blocks for the table, and
the Clustering Factor of any indexes.</em><br /><br /><em>It is possible
that for even low percentages of data being retrieved from a table that a
Full Table Scan can be the cheapest and best access method.</em> [...].
Trying to force the Optimizer into using another access method in this
circumstance is a waste of time, because all other access methods will
be more expensive. The only exception might be another Full Scan type
access, such as an Index Full Scan. But even then the gains (reduction
in cost) will only be marginal i.e. not a full order of magnitude less.<br /><br />When you see a FTS in a query execution plan <em>you
should check many things including, but not limited to, the number of
rows in the table, the number of blocks used for the table, the
Clustering Factor for each possible index, and the estimated row count
for the filters being used</em>.<br /><br />If the estimate is correct then a
FTS is the lowest cost access method and the Optimizer is right to
choose it. You should also check if your query is correct, or if there
is something wrong with the filter conditions in it.</deleted></blockquote>
<br />
<h2>
Counter Example</h2>
It
is relatively simple to show a similar query to the ones I was using
that uses an index instead of a full table scan for a far higher
percentage of data in the table than 0.5%. Jonathan posted one such
counter example in one of his replies on OTN, and we can easily
replicate this using the same test data set from my original post.<br />
<pre class="source-code"><code>
select sum (one) from scantest where pkid between 1000000 and 2000000 ;
SUM(ONE)
----------
1000001
Statistics
----------------------------------------------------------
10468 consistent gets
10311 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select sum (one) from scantest where pkid between 1000000 and 2000000
Plan hash value: 40453105
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 10282 (100)| |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST | 1000K| 8789K| 10282 (1)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX_SCAN_PKID | 1000K| | 2223 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PKID">=1000000 AND "PKID"<=2000000)
</code></pre>
<strong>Note:</strong><br /><ul>
<li>This query is retrieving 10% of the data in the table</li>
<li>An index range scan is being used to retrieve the data</li>
<li>A Full Table Scan had a cost of just over 22,000, so this index scan is much cheaper at 10,282</li>
<li>The key difference is the Clustering Factor of the <code>pkid</code> column</li>
</ul>
This index range scan is still cheaper when retrieving 20% of the data in the table:<br />
<pre class="source-code"><code>
select sum (one) from scantest where pkid between 1000000 and 3000000 ;
SUM(ONE)
----------
2000001
Statistics
----------------------------------------------------------
20762 consistent gets
20697 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select sum (one) from scantest where pkid between 1000000 and 3000000
Plan hash value: 40453105
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 20562 (100)| |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST | 2000K| 17M| 20562 (1)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX_SCAN_PKID | 2000K| | 4442 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PKID">=1000000 AND "PKID"<=3000000)
</code></pre>
<br />
<h2>
Conclusion?</h2>
Be careful of generalising and summarising
certain observed behaviours too much with Oracle. When considering
index usage there is much more to it than just the number of rows being
retrieved as a percentage of those in the table.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-17975147019584826012016-03-21T10:21:00.000+00:002016-11-25T15:01:04.504+00:00Creeping Slow PerformanceA <a class="externalLink" href="https://community.oracle.com/message/13730154" target="_blank" title="External link to https://community.oracle.com/message/13730154">recent OTN post</a>
requesting help for a slow performing update where Oracle was doing a
full table scan, also revealed that the update was getting slower and
slower each month, and that new data was loaded into the table each
month to be "processed" i.e. updated. Unfortunately this kind of design
suffers from a natural creeping slowdown in the elapsed time of the
update process. Each month it will take longer and longer to complete
the processing of the newly loaded data, until any elapsed time targets
are exceeded and it continues to take longer and longer to complete.
There is a bad design pattern here, but obviously the original designers
and implementers did not spot it.<br /><br />Consider the following - in
the first month after go-live the table only contains newly loaded data,
so a full table scan is used. Let's say it takes 2 minutes to
complete, and our target is 30 minutes for the whole process to
complete, which includes other steps as well. Everything looks good
performance wise, so we leave everything as it is and carry on.<br /><br />The
next month - month 2 - the size of the table has doubled, and Oracle
still uses a full table scan because you want to update half the data in
the table. This now takes 4 minutes because the table has doubled in
size. But the overall process still completes within 30 minutes, so
everything still looks good.<br /><br />Over the next few months the table
grows each month, and the elapsed time of the update continues to
increase by 2 minutes each month, assuming a similar volume of data is
loaded in each month. After 5 months the update now takes 10 minutes,
but this is still within our 30 minute target.<br /><br />But after 10
months the elapsed time is now 20 minutes, which is a significant
proportion of the 30 minutes, and people are starting to ask questions.
And after 15 months, if not sooner, the 30 minute target has been
missed and the overall data load and process is now taking too long.
Not only that, it is continuing to take longer and longer to complete
each month, and will only get worse.<br /><br />What can be done? Could an index help? No, is the simple answer. As I recently blogged in <a class="externalLink" href="http://databaseperformance.blogspot.co.uk/2016/03/full-table-scan-friend-or-foe.html" target="_blank" title="External link to http://databaseperformance.blogspot.co.uk/2016/03/full-table-scan-friend-or-foe.html">Full Table Scan - Friend or Foe?</a>,
the cutover point for the Optimizer to choose to use an index is when
you want to retrieve less than 0.5% or so of the data in a table. After
10 months you are still processing 10% of the data in the table each
month. And worse, even after 100 months (8+ years) you are still
processing 1% of the data in the table each month, and a full table scan
is <strong><em>still a lower execution cost</em></strong> than using any index on the table.<br /><br />What
could you do about this? One obvious solution is to use a "staging
table" of the same structure into which the data is initially loaded
each month. Then do the update and other processing on this table, and
finally insert all the data into the main table. This solution does not
suffer from the constant degradation in performance that the current
one does. Oracle will still use a full table scan, but it is only
reading from disk the data for this month, and so should complete in a
relatively constant time each month - assuming data volumes are similar
each month. The insert to the main table could be done quite quickly
using the "<code>append</code>" hint for a direct load, and then the
staging table could be truncated which should be very quick. Although
the data is copied around in Oracle twice, it should still be much
quicker than scanning a table with hundreds of months of data in it.<br /><br />Other
possibilities include partitioning by a key unique to each month's
data, which is also used in the update statement so that partition
pruning occurs. This may be problematic if there is not a clean
partition key to use. It should not be a "processed flag", as this
would be updated by the processing itself which would result in row
movement between partitions and extra work by Oracle. <br /><br />Ultimately
the issue is about spotting these kind of bad design patterns early on
and avoiding these pending future performance problems. To get to one
month's data being less than 0.5% of the total data volume in the table
would take you over 200 months, or almost 17 years. And during those 17
years this monthly data load processing would always take longer and
longer to complete. A different design can avoid this future performance
problem completely.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-59513675755710148852016-03-01T11:42:00.003+00:002016-03-30T12:07:42.341+01:00Full Table Scan - Friend or Foe?[Or <b>Don't be afraid of Full Table Scans]</b><br />
<br />
<b>UPDATE 30 March 2016</b><br />
I've come to realise that there are some mistakes and inaccuracies in this post [<a href="http://databaseperformance.blogspot.co.uk/2016/03/full-table-scan-not-always-as-low-as-05.html">see this post for more details</a>], so I've edited this post and corrected what I can. Rather than leave behind incorrect claims, I've replaced them by the corrected text, so hopefully now this post no longer has its previous faults.<br />
<br />
<b>ORIGINAL POST:</b><br />
Many people consider a Full Table Scan (FTS) in a query execution plan to be a bad thing - reading <b><i>every</i></b> record from a table to find only those records the query needs. This is indicated by a "<code>TABLE ACCESS FULL</code>"
in an execution plan. But is it really all that bad? Is it actually
sometimes the right tool for the job? Can a Full Table Scan sometimes
have a lower cost than using an Index?<br />
<br />
My view is that the Oracle Optimizer will only choose a FTS under the following two conditions:<br />
<ul>
<li>There is no other possible access method to get the needed data from the table<ul>
<li>No other access method is currently available, but another access method might be a lower cost if it was available</li>
</ul>
</li>
<li>A FTS is a lower cost (and so should be faster) than all other possible access methods<ul>
<li>The FTS is the "best" access method of all available and the Optimizer is correct to use it</li>
</ul>
</li>
</ul>
In other words, when you see a "<code>TABLE ACCESS FULL</code>"
in an execution plan you should not jump to the conclusion that this is
wrong and "needs to be fixed", because maybe it is correct and it is
indeed the best access method for that particular step of the query
operation. If you have concerns about the FTS then you need to double
check things to find out whether it is the right thing for the Optimizer
to be choosing, or whether a better access method would reduce the
execution cost. Jumping to the wrong conclusion can lead you down a
dead end when trying to improve a query's performance.<br />
<br />
So when you see a Full Table Scan in
an execution plan you should check the query and all other relevant factors to see if a FTS is indeed the "best"
access method to get that data because it has the lowest cost. Don't assume that the Optimizer
is wrong - normally it isn't. It is just as likely to be your query
that is affecting the execution plan chosen.<br />
<br />
<h2>
Calculating the Cutover Point</h2>
The
actual percentage or fraction of rows in the table that is the cutover
point between the Optimizer using a Full Table Scan or an available
Index is itself mainly dependent on the average number of rows stored
per data block. The other main factor is the value being used by the
Optimizer for Multi-Block Read Count (MBRC), which is part of the System
Statistics stored in the database. It is possible to calculate this
cutover point yourself on a given table to see when a FTS really is
cheaper and when an Index might help.<br />
<br />
For a specific query
involving an equality filter on a column, the other factor is what
fraction of the data rows in the table have the same value stored in
them. Oracle maintains this as a statistic on each column named
"Density". By comparing the Density value for a column against the
fraction of rows in the table needed by the query, you can see whether a
FTS would be cheaper or not for a filter on that column. Remember that
a percentage is just a fractional decimal value multiplied by 100 - so a
Density of 0.0025 means 0.25%.<br />
<br />
First the value used for MBRC (Multi-Block Read Count). This is stored in the <code>AUX_STAT$</code> table owned by <code>SYS</code>.
Its value only gets set when you gather system statistics. If set,
then that particular value is used. If not set then a default value of 8
is used (note that it seems to ignore the value of the initialization
parameter <code>db_file_multiblock_read_count</code>). The Optimizer also uses the values for <code>MREADTIM</code> and <code>SREADTIM</code> (Multi-block read time and Single block read time) also in the <code>AUX_STAT$</code> table. Again, if these have not been set then it will use a default formula to derive them from the values for <code>IOSEEKTIM</code> and <code>IOTFRSPEED</code>.<br />
<br />
To try and keep this explanation short we can jump to the following formulae used when you have not gathered system statistics.<br />
<pre class="source-code"><code>SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)
MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED)
</code></pre>
For the default values of <code>MBRC</code> (8), <code>IOSEEKTIM</code> (10) and <code>IOTFRSPEED</code> (4096) and a <code>DB_BLOCK_SIZE</code> of 8192 (8 KB) you get an <code>SREADTIM</code> value of 12 milliseconds, and a <code>MREADTIM</code> value of 26 milliseconds.<br />
<br />
When
a Full Table Scan occurs the Optimizer knows that it will be doing
multi-block reads. However, it needs to cast or convert the cost of
these reads into units of single block reads. This is because all other
I/O costs are in terms of single block reads internal to the Optimizer.
And the "cost" of a multi-block read is not the same as the "cost" of a
single block read - a multi-block read should take longer given the
greater number of blocks being transferred.<br />
<br />
Instead of dividing the number of blocks in the table by the full <code>MBRC</code> value, it "adjusts" this value by the ratio of <code>SREADTIM</code> to <code>MREADTIM</code>, and then uses this value in the cost calculation. <br />
<br />
The cost of a FTS would then be calculated as (note the brackets):<br />
<pre class="source-code"><code>FTS Cost = #Blocks in Table / (MBRC * SREADTIM / MREADTIM)
FTS Cost = (#Blocks in Table * MREADTIM) / (MBRC * SREADTIM)
</code></pre>
For an Index lookup to be cheaper than this FTS cost, we can
calculate the fraction of rows in the table where the cost of using an
Index would be slightly less than this FTS cost. For this we need to
know the following values from the statistics Oracle has on the table
and the particular column used in the equality filter:<br />
<ul>
<li>Number of Rows in the table - <code>NUM_ROWS</code> in <code>USER_TABLES</code> or <code>USER_TAB_STATISTICS</code></li>
<li>Number of Blocks used for the table - <code>BLOCKS</code> in <code>USER_TABLES</code> or <code>USER_TAB_STATISTICS</code></li>
<li>Density of the column - <code>DENSITY</code> in <code>USER_TAB_COLS</code> or <code>USER_TAB_COL_STATISTICS</code></li>
</ul>
A Full Table Scan is cheaper than using an Index lookup when:<br />
<pre class="source-code"><code>Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
</code></pre>
When the Density of a column is less than this value then an Index lookup would be cheaper.<br />
<br />
<h2>
Conclusions</h2>
Although
a Full Table Scan can seem a "brute force" approach to finding some
matching records in a table, it can sometimes be the better way of doing
it though.<i> </i>It depends on several factors including how many rows you want back
from the table, the number of blocks for the table, and the Clustering
Factor of any indexes.<br />
<br />
It is possible that for even low percentages of data being retrieved
from a table that a Full Table Scan can be the cheapest and best access
method. Trying
to force the Optimizer into using another access method in this
circumstance is a waste of time, because all other access methods will
be more expensive. The only exception might be another Full Scan type
access, such as an Index Full Scan. But even then the gains (reduction
in cost) will only be marginal i.e. not a full order of magnitude less.<br />
<br />
When
you see a FTS in a query execution plan you should check many things including, but not limited to, the
number of rows in the table, the number of blocks used for the table,
the Clustering Factor for each possible index, and the estimated row
count for the filters being used. If the estimate
is correct then a FTS is the lowest cost access method and the Optimizer
is right to choose it. You should also check if your query is correct,
or if there is something wrong with the filter conditions in it.<br />
<br />
<h2>
Tests</h2>
Lets
show whether this holds true with some tests. There is quite a lot of
output here, but I want to provide everything so that anyone else can
reproduce these tests on their own systems.<br />
<br />
Database version - 12.1.0.2.0<br />
Operating System - Oracle Linux 7.2<br />
<br />
Scan Test table - 10 million rows, with columns of different repeated values:<br />
<pre class="source-code"><code>drop table scantest ;
--
prompt Loading data ....
create table scantest
tablespace testdata
as
select r pkid
, 1 one -- a constant, which forces actual data row access
, mod (r, 10) pct10 -- 10 values = 10% of data in table
, mod (r, 20) pct5 -- 20 values = 5% of data in table
, mod (r, 50) pct2 -- 50 values = 2% of data in table
, mod (r, 100) pct1 -- 100 values = 1% of data in table
, mod (r, 200) pct05 -- 200 values = 0.5% of data in table
, mod (r, 500) pct02 -- 500 values = 0.2% of data in table
, mod (r, 1000) pct01 -- 1,000 values = 0.1% of data in table
, mod (r, 2000) pct005 -- 2,000 values = 0.05% of data in table
, mod (r, 5000) pct002 -- 5,000 values = 0.02% of data in table
, mod (r, 10000) pct001 -- 10,000 values = 0.01% of data in table
, mod (r, 20000) pct0005 -- 20,000 values = 0.005% of data in table
, mod (r, 50000) pct0002 -- 50,000 values = 0.002% of data in table
from (select rownum r
from (select rownum r from dual connect by level <= 1000) a,
(select rownum r from dual connect by level <= 1000) b,
(select rownum r from dual connect by level <= 1000) c
where rownum <= 10000000) ;
--
prompt Gathering Statistics ....
exec dbms_stats.gather_table_stats ('JOHN', 'SCANTEST')
--
prompt Creating Indexes ....
create unique index ix_scan_pkid on scantest (pkid) ;
create index ix_scan_pct2 on scantest (pct2) ;
create index ix_scan_pct1 on scantest (pct1) ;
create index ix_scan_pct05 on scantest (pct05) ;
create index ix_scan_pct02 on scantest (pct02) ;
create index ix_scan_pct01 on scantest (pct01) ;
</code></pre>
This produces a table with the following statistics:<br />
<pre class="source-code"><code> TABLE STATISTICS
Table %F IT In Ext Next Ext %I
------------------------------ --- --- ---------------- ---------------- ---
SCANTEST 10 1 65,536 1,048,576
Avg Spc
Num Rows Blocks E Blocks Free/Blk Chains Avg Row Len
------------ ------------ -------- --------- ------ -----------
10,000,000 80,951 0 .00 0 53.000
Table Num Extents Blocks Avg Blocks
------------------------------ ------------ ------------ ------------
SCANTEST 151 81,920 543
Leaf Distinct Clustering
Index Height Blocks Keys Factor
-------------------- -------- -------- ---------------- ----------------
IX_SCAN_PKID 3 22,132 10,000,000 80,528
IX_SCAN_PCT2 3 19,503 50 4,026,368
IX_SCAN_PCT1 3 19,518 100 8,052,718
IX_SCAN_PCT05 3 20,212 200 10,000,000
IX_SCAN_PCT02 3 20,629 500 10,000,000
IX_SCAN_PCT01 3 20,768 1,000 10,000,000
</code></pre>
<br />
Full Table Scan cost is calculated as follows:<br />
<pre class="source-code"><code>FTS Cost = (BLOCKS * MREADTIM) / (MBRC * SREADTIM)
</code></pre>
On my system <code>MREADTIM</code> and <code>SREADTIM</code> are not set, and the others have default values in the system statistics in <code>AUX_STAT$</code> (<code>IOSEEKTIM</code> = 10, <code>IOTFRSPEED = 4096</code>), so using the formula from before this gives <code>SREADTIM</code> of 12 ms and <code>MREADTIM</code> of 26 ms.<br />
<br />
Plugging these values into the previous formula gives about 22,000 as the cost for the FTS:<br />
<pre class="source-code"><code>(80951 * 26) / (8 * 12) = 2104726 / 96 = 21,924.23
</code></pre>
We can calculate the cutover point up to which a FTS would be cheaper than an Index Scan using the formula given before:<br />
<pre class="source-code"><code>Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
Density > (80,951 * 26) / (8 * 12 * 10,000,000) = 0.002192 or 0.22% approximately
</code></pre>
Remember that this is based on several assumptions (Index
Clustering Factor) and simplifications (ignoring CPU costs). Thus the
cutover point will not be a precise value of 0.0022 (0.22%) but
something around this value.<br />
<br />
Having calculated the FTS cost at
about 22,000 we can also calculate the expected Index Scan costs, and
see whether they do drop below the FTS cost under 0.22% of the data in
the table.<br />
<br />
An Index Scan cost has 2 components:<br />
<pre class="source-code"><code>Index Access Cost = Levels + (Leaf Blocks * Filter Factor)
Data Access Cost = Clustering Factor * Filter Factor
</code></pre>
The Filter Factor is the selectivity of a single value for an
equality predicate filter, being the Density of the column, or one over
the Number of Distinct Values. In real terms it is the percentage of
rows being retrieved, which in our test table is indicated by the column
name.<br />
<br />
For all indexes the number of Levels is 3 and the number
of Leaf Blocks is about 20,500. Yes, it does vary but it is not
significantly different for the tests we are doing here. Also from
index PCT05 onwards the Clustering Factor is always 10 million i.e. it is not clustered and it is the row count in the table. <br />
<br />
<table class="twtable"><tbody>
<tr class="evenRow"><th>Column Name</th><th>% of Rows returned</th><th>Fraction of Rows</th><th>Index Cost</th><th>Data Cost</th><th>Total Cost</th></tr>
<tr class="oddRow"><td>pct1</td><td>1%</td><td>0.01</td><td>3 + (20,500 * 0.01) = 3 + 205 = 208</td><td>8,000,000 * 0.01 = 80,000</td><td>80,208</td></tr>
<tr class="evenRow"><td>pct05</td><td>0.5%</td><td>0.005</td><td>3 + (20,500 * 0.005) = 3 + 102.5 = 105.5</td><td>10M * 0.005 = 50,000</td><td>50,106</td></tr>
<tr class="oddRow"><td>pct02</td><td>0.2%</td><td>0.002</td><td>3 + (20,500 * 0.002) = 3 + 41 = 44</td><td>10M * 0.002 = 20,000</td><td>20,044</td></tr>
<tr class="evenRow"><td>pct01</td><td>0.1%</td><td>0.001</td><td>3 + 20.5 = 23.5</td><td>10M * 0.001 = 10,000</td><td>10,024</td></tr>
</tbody></table>
<br />
Our
calculations confirm that the expected cost of an Index Scan should
drop below that of a Full Table Scan when less than 0.22% of the data in
the table is being selected.<br />
<br />
Does this bear out in practise?
Will Oracle switch from a Full Table Scan to an Index Scan when the
fraction of rows requested drops below 0.0022? Lets see. <br />
<br />
<b>Note:</b>
The buffer cache and shared pool were flushed from a SYSDBA session
before each query execution, so the buffer cache was empty in each case.
And we are only interested in the I/O statistics from the query
execution - other statistics such as Redo and SQL*Net bytes sent are not
relevant and have been removed.<br />
<br />
<b>Query 1:</b><br />
<pre class="source-code"><code>set autotrace on statistics
--
select sum (one) from scantest where pct2 = 1 ;
SUM(ONE)
----------
200000
Statistics
----------------------------------------------------------
80704 consistent gets
80541 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
</code></pre>
Execution Plan (from <code>dbms_xplan.display_cursor</code> for the <code>SQL_ID</code>):<br />
<pre class="source-code"><code>select sum (one) from scantest where pct2 = 1
Plan hash value: 1745049784
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 22028 (100)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| SCANTEST | 200K| 1171K| 22028 (1)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PCT2"=1)
</code></pre>
<b>Note:</b><br />
<ul>
<li>2% of 10 million rows is 200 thousand, which the Optimizer has correctly estimated.</li>
<li>Full
table scan cost is 22,028. This is very close to my own estimate of
about 22,000. The missing cost component would be for CPU work to
filter each row.</li>
<li>80,541 physical reads occurred, which is very close to the 80,951 blocks reported in the table.</li>
</ul>
<br />
<b>Query 2:</b><br />
Repeat this with a lower percentage column - pct05<br />
<pre class="source-code"><code>select sum (one) from scantest where pct05 = 1 ;
SUM(ONE)
----------
50000
1 row selected.
Statistics
----------------------------------------------------------
80704 consistent gets
80541 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select sum (one) from scantest where pct05 = 1
Plan hash value: 1745049784
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 22040 (100)| |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| SCANTEST | 50000 | 341K| 22040 (1)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PCT05"=1)
</code></pre>
<b>Note:</b><br />
<ul>
<li>Rows estimate is now 50,000, as expected</li>
<li>Full table scan cost is slightly different for some unknown reason, but still about 22,000.</li>
<li>Same number of physical reads occurred - 80,541.</li>
</ul>
<br />
<b>Query 3:</b><br />
Continue down to the next lower percentage column - pct02<br />
<pre class="source-code"><code>select sum (one) from scantest where pct02 = 1 ;
SUM(ONE)
----------
20000
1 row selected.
Statistics
----------------------------------------------------------
20212 consistent gets
20060 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select sum (one) from scantest where pct02 = 1
Plan hash value: 3458310886
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 20049 (100)| |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST | 20000 | 136K| 20049 (1)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX_SCAN_PCT02 | 20000 | | 44 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PCT02"=1)
</code></pre>
<b>Note:</b><br />
<ul>
<li>Now 0.2% of data in this table, which is low enough that an Index Scan should be cheaper</li>
<li>Index cost is indeed lower than a Full Table Scan, as expected - 20,049 versus 22,040</li>
<li>Index cost is very close to our own calculation of 20,044 - again CPU cost is the difference</li>
<li>Only 20,060 physical reads now - Index branch blocks + Leaf blocks + Data rows</li>
</ul>
<br />
<b>Query 4:</b><br />
Repeat this with a lower percentage column - pct01<br />
<pre class="source-code"><code>select sum (one) from scantest where pct01 = 1 ;
SUM(ONE)
----------
10000
1 row selected.
Statistics
----------------------------------------------------------
10192 consistent gets
10038 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select sum (one) from scantest where pct01 = 1
Plan hash value: 1707962624
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 10025 (100)| |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST | 10000 | 70000 | 10025 (1)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX_SCAN_PCT01 | 10000 | | 23 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PCT01"=1)
</code></pre>
<b>Note:</b><br />
<ul>
<li>Index cost is halved because row count is halved - 10,025 versus 10,024 calculated</li>
<li>Only 10,038 physical reads now - Index branch blocks + Leaf blocks + Data rows</li>
</ul>
<br />
<b>Query 5:</b><br />
If we force the use of an index for the query on the PCT05 column using a hint we get the following:<br />
<pre class="source-code"><code>select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1 ;
SUM(ONE)
----------
50000
Statistics
----------------------------------------------------------
50271 consistent gets
50133 physical reads
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1
Plan hash value: 3145193111
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 50116 (100)| |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST | 50000 | 341K| 50116 (1)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | IX_SCAN_PCT05 | 50000 | | 104 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PCT05"=1)
</code></pre>
Note how the estimated cost is 50,116, which is far greater than
the 22,028 of a full table scan, and agrees with my earlier calculation of 50,106 for this index.
Instead of the 80,000
physical disk reads done before, only 50,000 have been done. However,
the 80,000 disk reads were really multi-block disk reads i.e. nearer
10,000 real disk reads would have been done, each for 8 disk blocks at
once. The 50,000 disk reads for the index execution plan would be
single block disk reads. John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-8208484885086189712016-01-25T15:14:00.000+00:002016-11-25T15:00:45.888+00:00What's Going On? Oracle Activity Monitoring ViewsHow do you find out what is happening <em><strong>now</strong></em>
inside an Oracle database instance when you only have SQL level access
to the database instance, and no nice GUI management tool? I'll cover
how to start doing this with some SQL query examples.<br /><br />A user is
complaining that the application is running slowly, or a report someone
scheduled is taking too long to finish - where do you look to find out
what is going on inside the Oracle database right now? Well if you have
a nice GUI based tool such as Enterprise Manager then go there first,
because that will have summary screens of what is going on, and also
drill downs into the details behind what is going on. Such tools are
easier and quicker to use, and mean you don't have to remember the names
of internal Oracle dynamic performance views.<br /><br />But what if you
don't have a nice GUI based tool? Or you want to work out for yourself
what is really going on in Oracle? Or something has happened to the
nice GUI tool and you cannot use it for one reason or another? Then you
need to know about Dynamic Performance Views in Oracle, also known as
the <code>V$</code> tables, and which ones are the most relevant ones.
The Dynamic Performance Views are views defined internal to Oracle that
in fact map onto memory structures inside the SGA (System Global Area).
They enable you to see in real time various data within the SGA using
normal SQL queries.<br /><br />When investigating performance problems you
should generally narrow it down to a specific problem and the users
being affected by it. In Oracle each connected user has a corresponding
"session" inside the Oracle database instance, and information on each
session is made available in the <code>V$SESSION</code> dynamic performance view. <br /><br />This
is both an advantage and disadvantage of the Dynamic Performance Views -
they are a snapshot of current values in the SGA, and these values can
be changing all of the time. Look again and you will get different
results. Some views show cumulative values rather than current values,
but they are always increasing over time and also constantly changing.<br /><br />But with that in mind, you can use the <code>V$SESSION</code>
view to look at what specific sessions are doing at the moment, and
look for any slow, long running SQL statements. We can group the data
columns into the following related groups:<br />
<ul>
<li>Session Connection Information - User name, Client Application software, Logon time</li>
<li>Current Activity Information - Status, SQL statement executing</li>
<li>Delays - both normal Waits and abnormal Blocks i.e. small delays or completely blocked<ul>
<li>Waits are delays that will definitely finish e.g. read a block from disk</li>
<li>Blocks are delays that are outside the control of the session and can potentially go on forever <ul>
<li>e.g. blocked waiting for a row lock held by another session</li>
</ul>
</li>
</ul>
</li>
</ul>
For a full list of the columns in <code>V$SESSION</code>
look in the "Oracle Database Reference" Manual under "Dynamic
Performance Views", where you will get a brief description of each
column.<br />
<br />
<h2>
All Active Sessions</h2>
Want a list of all active sessions and whether they are waiting or not? Try this query from SQL*Plus:<br />
<pre class="source-code"><code>-- List the active sessions and break down key data
--
column sid format 999999
column db_user format a8 heading 'USER'
column db_login_time format a6 heading 'LOGON'
column program format a10 heading 'APP_N'
column active_for format 999 heading 'ACTV|FOR'
column state format a5 heading 'STATE'
column waiting_for format 999 heading 'WAIT|FOR'
column event format a12 heading 'WAIT ON'
column sql_id format a13 heading 'SQL ID'
--
select s.sid,
nvl (s.username, 'SYS') db_user,
to_char (s.logon_time, 'hh24:mi') db_login_time,
substr (s.program, 1, 10) program,
s.last_call_et active_for,
decode (s.state,
'WAITING', 'WAIT',
'WAITED KNOWN TIME', 'NW L',
'NW S') state,
decode (s.wait_time, 0, seconds_in_wait, 0) waiting_for,
substr (s.event, 1, 12) event,
nvl (s.sql_id, 'Not Executing') sql_id
FROM v$session s
WHERE s.type = 'USER'
AND s.status = 'ACTIVE'
AND s.wait_class != 'Idle'
/
--
</code></pre>
Note the following about this query:<br /><ul>
<li>It restricts the
sessions to only those from real users (not internal background
sessions) that are active and experiencing non-idle waits.<ul>
<li>Idle waits are generally waits external to Oracle, such as waiting for the next SQL statement to execute</li>
</ul>
</li>
<li>The number of columns of data output is kept low to fit within an 80 column wide terminal screen<ul>
<li>If you have a wider output format then you could add extra data columns to the "select"</li>
</ul>
</li>
<li>The "<code>decode</code>" of "<code>state</code>" is meant to encode the following combinations of possibilities:<ul>
<li>WAIT = WAITING, NW = NOT WAITING; PREV WAIT:- S = SHORT, L = LONG</li>
<li>i.e. it combines both whether the session is currently waiting or not, with whether the previous wait was a short or long wait</li>
</ul>
</li>
<li>It is assumed that users have all connected today, so only the time of connection is shown<ul>
<li>If connections are long lived, then change the date format used in the "<code>to_char</code>" to include the day</li>
</ul>
</li>
</ul>
From this you can see the following:<br /><ul>
<li>How many sessions are active, and how many are currently waiting on something?<ul>
<li>Sessions that are active and not waiting are therefore running on a CPU</li>
</ul>
</li>
<li>How many sessions are running the same SQL statement? Look at the values of <code>SQL_ID</code></li>
<li>How many sessions are experiencing the same kind of wait event?</li>
<li>Are sessions experiencing long or short waits most of the time?</li>
</ul>
<h2>
One Session's Details</h2>
If you know the <code>SID</code>
(Session ID) of one session you are interested in and want to drill
down further, then you could run the following set of queries from
within SQL*Plus:<br />
<pre class="source-code"><code>--
set newpage none
--
set define on
undefine SESSION_ID
--
accept SESSION_ID prompt 'Enter Session ID > '
--
column sid FORMAT 99999
column username FORMAT a12
column logt FORMAT a12 HEADING 'Logon Time'
column sql_start format a9 heading 'SQL Start'
column last_call_et heading 'Time at Status'
column module FORMAT a16 heading 'Module'
column client_info format a16 heading 'Client Info'
column command format 999 heading 'cmd'
column taddr format a8 heading 'TX Addr'
column server heading 'Server|Type'
column schemaname format a12 heading 'Schema|Name'
column type heading 'Session|Type'
column sql_id heading 'Curr SQL ID'
column prev_sql_id heading 'Prev SQL ID'
column lockwait format a8 heading 'Wait Lock|Addr'
column event format a25 heading 'Waiting For'
column wait_class format a12 heading 'Wait Class'
column row_wait_obj# heading 'Object|Waiting On'
column wait_time heading 'Last Wait Time|(0=Waiting)'
column seconds_in_wait heading 'Elapsed From|Last Wait'
column blocking_session heading 'Blocking|Session ID'
column blocking_session_status heading 'Blocking|Sess Status'
--
select sid,
username,
to_char (logon_time, 'dd/mm hh24:mi') logt,
status,
last_call_et,
to_char (sql_exec_start, 'HH24:MI:SS') sql_start
from v$session
where sid = &&SESSION_ID
/
--
select sid,
substr (module, 1, 24) module,
substr (client_info, 1, 30) client_info,
server, schemaname, type
from v$session
where sid = &&SESSION_ID
/
--
select sid,
sql_id, prev_sql_id,
event,
substr (wait_class, 1, 20) wait_class
from v$session
where sid = &&SESSION_ID
/
--
select sid,
lockwait,
row_wait_obj#,
wait_time, seconds_in_wait,
blocking_session, blocking_session_status
from v$session
where sid = &&SESSION_ID
/
--
column sql_text format a62
--
select sql_id, sql_text from v$sql
where sql_id = (select sql_id from v$session where sid = &&SESSION_ID)
/
--
undefine SESSION_ID
--
set newpage 1
</code></pre>
<br />This gives you a lot more information from <code>V$SESSION</code> for the one session, broken down into related sets of data, as described before. Note that the final query selects from <code>V$SQL</code>
which is probably the other main Dynamic Performance View, which has an
entry for each SQL statement currently in the library cache in the SGA.
Remember that a SQL statement in the library cache can be shared
between many sessions.<br />
<br />
<h2>
One Session Detailed Drill Down</h2>
If you
really want to drill down more into what a single session or a set of
sessions is doing over a period of time then you could use <a class="externalLink" href="http://blog.tanelpoder.com/files/scripts/snapper.sql" target="_blank" title="External link to http://blog.tanelpoder.com/files/scripts/snapper.sql">Tanel Poder's snapper.sql</a>. This snapshots the contents of <code>V$SESSION</code>
and other Dynamic Performance views over a period of time and reports
on what it captured. Mainly this will be the wait events experienced by
the sessions, but a lot of other activity statistics are reported too.<br /><br />Snapper
is written in PL/SQL and uses two in-memory PL/SQL tables for each
snapshot, and then lists out anything that has changed between them. It
does not write anything to the database, and only needs minimal
permissions - <code>SELECT_CATALOG_ROLE</code> and execute on <code>DBMS_LOCK.SLEEP</code>. Thus it is safe to run on any database.<br />
<br />
<h2>
Conclusion</h2>
I've
shown how to list all the currently active sessions on a database
including the SQL they are executing and what they are waiting on, and
how to get more details on what just one session is doing.<br /><br />If you
want even more information on what a single session is doing then you
can SQL Trace it, which creates a trace file on the database server of
all the SQL statements executed by that session and all the waits it
experienced while tracing was enabled. Nothing is missed out because
every executed SQL statement from that session is traced. You can then
use utilities such as "<code>tkprof</code>" to post process the trace file and report on the SQL statements executed, their elapsed times, and the waits they experienced.<br /><br />If
the slow down is not currently happening but was very recent then you
can look back at the recent past using ASH (Active Session History), but
note that ASH is an extra cost option - you must have the Diagnostic
Pack of the Enterprise Edition of Oracle. I'll write something about
using ASH in a future post.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-70637558033645878572015-12-29T15:15:00.000+00:002015-12-29T15:15:42.261+00:00GRUB, os-prober and Red Hat / Oracle LinuxI've been successfully using VirtualBox to have test environments to run
Oracle Linux and Oracle Database in from some time, but there are
limitations to what you can do. So I decided that I wanted to install
Oracle Linux onto another disk partition on my PC so I could dual boot
into it for some more advanced Oracle Database tests. Well the Oracle
Linux installation itself went ahead trouble free - I just had to
remember to disable GRUB bootloader installation as GRUB was already
installed from my other Linux (Arch Linux) - but then I ran into some
problems trying to get this newly installed Linux added properly to the
main GRUB boot menu I was using. This post describes why this occurred
(after much digging around on my part), and a very quick and simple
solution for it.<br /><br />Detecting other Linux installations on other
disk partitions to add to the GRUB boot menu is done on Arch Linux by
installing the "<code>os-prober</code>" package which adds some extra scripts used by "<code>grub-mkconfig</code>". The specific problem is that while "<code>os-prober</code>"
did detect the Oracle Linux kernels, they were being added to the GRUB
menu in the wrong order - it looked like an alphabetical ordering rather
than a numeric ordering by kernel version number. This meant that the
first Oracle Linux kernel listed in the GRUB menu was not the latest one
installed, and in fact more likely to be the oldest one instead or a
rescue kernel.<br /><br />To cut a long story short the problem is due to a combination of the code in the "<code>/usr/lib/linux-boot-probes/mounted/40grub2</code>" detection script and the contents of the "<code>/boot/grub2/grub.cfg</code>" file in the Oracle Linux installation. The "<code>grub.cfg</code>" file in the Oracle Linux installation uses some keywords that are not detected by the "<code>40grub2</code>" script in Arch Linux, so the bootable Linux kernels are not listed in the same order as they are in the source "<code>grub.cfg</code>" file. Instead it is the "<code>90fallback</code>" script that detects the bootable Linux kernels when it is run afterwards by "<code>os-prober</code>". Actually it is run by "<code>linux-boot-prober</code>" and it does a direct listing of Linux kernel files in the "<code>/boot</code>"
directory of the other Linux disk partition, and adds each of these to
the local GRUB configuration file. And the result of this is that the
other Linux kernels are detected and added in alphabetical order.<br /><br />
<h2>
Details on the Problem</h2>
The "<code>40grub2</code>" script works by opening the "<code>/boot/grub2/grub.cfg</code>" file from another Linux installation and looking for the entries for bootable Linux kernels. The idea is that "<code>40grub2</code>" will find Linux kernels in the same order they are in the "<code>grub.cfg</code>" on the other Linux installation, and they will be added to the local "<code>grub.cfg</code>"
file in the same order. The benefit of this method is that the first
Linux kernel listed for this other installation in the main GRUB boot
menu will be the same one as listed by the other Linux installation
itself. Which in turn means that if it sorts the Linux kernels in any
way or puts a specific Linux kernel first as the default in its "<code>grub.cfg</code>" configuration file, then this is also reflected in the local GRUB configuration file of my main Linux installation.<br /><br />The "<code>40grub2</code>" script works by opening the "<code>/boot/grub2/grub.cfg</code>" file of the other Linux installation and then reads each line in turn looking for ones that begin "<code>menuentry</code>", "<code>linux</code>" or "<code>initrd</code>". I believe that these are "standard" keywords that GRUB should be using. Unfortunately Oracle Linux is using keywords of "<code>linux16</code>" and "<code>initrd16</code>" instead, which means that these lines are not matched at all by the "<code>40grub2</code>"
script and no bootable Linux kernels are matched at all. It seems that
Red Hat on which Oracle Linux is based uses these keywords for some
obscure, historical reason or other. Either way, they are used and they
do not match what "<code>40grub2</code>" is looking for.<br /><br />Instead the bootable Linux kernels are detected by the "<code>90fallback</code>" script when it runs afterwards, and they are detected in alphabetical naming order as mentioned before.<br /><br /><h2>
Solutions</h2>
There is a quick, easy and good enough solution you can do yourself, and then there is a more official solution.<br /><br />First, you can just manually edit your local "<code>40grub2</code>" file and change two lines in it. Add a "<code>16</code>" variation to the lines in the "<code>case</code>" block that test for "<code>linux</code>" and "<code>initrd</code>". Here is the output from "<code>diff</code>" showing the before (<) and after (>) versions of the two lines I changed.<br />
<pre class="source-code"><code>
67c67
< linux)
---
> linux | linux16 )
80c80
< initrd)
---
> initrd | initrd16 )
</code></pre>
Once edited run "<code>grub-mkconfig</code>" again to regenerate your "<code>grub.cfg</code>" file, and it should correctly pick up those entries from the other Linux installation now.<br /><br />Second,
it does not look like there is actually an official solution, which can
often be the case with open source software. I found some bug reports
about this problem but there was some finger pointing going on both ways
between the GRUB people and the Red Hat people. It looked like the
GRUB people felt that the official keywords were "<code>linux</code>" and "<code>initrd</code>", so it was a Red Hat problem to solve; while the Red Hat people felt that "<code>linux16</code>" and "<code>initrd16</code>" were valid in a GRUB configuration file and did work so it was a GRUB problem with the "<code>40grub2</code>" script. <br /><br />One person did raise the question on how the main Linux that is adding these entries to its local "<code>grub.cfg</code>" file should be treating these entries with the "<code>16</code>" suffix from the other Linux. Should it ignore them and just use the normal keywords in its own "<code>grub.cfg</code>" file, or should it use exactly the same keywords? The latter solution is a problem because the keywords found in the other "<code>grub.cfg</code>" file are NOT returned back to the "<code>os-prober</code>" script i.e. it is assumed they are only "<code>linux</code>" and "<code>initrd</code>". Making "<code>40grub2</code>" return these extra keywords as extra data fields would need a lot of changes in other places - both "<code>40grub2</code>" and "<code>os-prober</code>" at least, and possibly others too if there is a common format used for passing around information on bootable Linux kernels. <br /><br />So
you can see how something that looks simple can grow into something
much bigger, and could have significant changes to something as
important as GRUB. And GRUB is a very critical piece of software used
at system boot time, so no "obvious solution" should be rushed through
without a lot of extra thought and testing. Unfortunately I don't know
when we will get any kind of "official solution" to this.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-79269274500963692682015-12-28T15:09:00.000+00:002015-12-28T15:09:43.262+00:00Oracle In Memory and CPU Execution Efficiency<a class="externalLink" href="http://blog.tanelpoder.com/" target="_blank" title="External link to http://blog.tanelpoder.com/">Tanel Poder</a> has been doing a series of posts on "<a class="externalLink" href="http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/" target="_blank" title="External link to http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/">RAM is the new disk</a>" trying to show how the new "<a class="externalLink" href="http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html" target="_blank" title="External link to http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html">Oracle Database In-Memory</a>"
feature in Oracle 12c uses memory in a different and more efficient way
than the normal, traditional buffer cache does. He hasn't finished the
series of posts yet, but I started to draw some of my own conclusions
from the data he published in <a class="externalLink" href="http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles" target="_blank" title="External link to http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles">his last post</a> and I'd thought I'd publish my thoughts.<br /><br />I
know a reasonable amount about how CPU's work internally, so I assumed
that the main performance improvement of Oracle Database In-Memory would
be from a better memory access pattern that would significantly reduce
the number of execution stalls inside the CPU. Tanel is one of the good
Oracle bloggers, and he has been sharing the details of the tests he
has been doing in this series of blog posts as well as the results from
them so I can see if my hypothesis on the new Database In-Memory holds up.<br />
<br />
<h2>
Tanels' Tests</h2>
The main results are in <a class="externalLink" href="http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles" target="_blank" title="External link to http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles">this post</a>
where he gives the 6 SQL queries he ran and their results in terms of
elapsed time and CPU execution cycles - some against the traditional
buffer cache and some against the new In-Memory cache. No disk accesses were involved as all the data was already in memory in the Oracle SGA.<br /><br />I'm
only interested in the main 2 SQL queries that are the same as each
other, except the first one gets its data from the traditional data
block based buffer cache in memory while the second gets its data from
the new In Memory column storage based cache i.e. both execute the same "<code>SELECT</code>" against the same table and all of the data needed is already in memory in the SGA.<br />
<pre class="source-code"><code>3. FULL TABLE SCAN BUFFER CACHE (NO INMEMORY)
SELECT /*+ MONITOR FULL(c) NO_INMEMORY */ COUNT(cust_valid)
FROM customers_nopart c WHERE cust_id > 0;
4. FULL TABLE SCAN IN MEMORY WITH WHERE cust_id > 0
SELECT /*+ MONITOR FULL(c) INMEMORY */ COUNT(cust_valid)
FROM customers_nopart c WHERE cust_id > 0;
</code></pre>
<br />
Tanel has also made all the low level CPU execution measurements available in a <a class="externalLink" href="https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqAAlHnZqmuVmSFbHMLDsjaU/edit?usp=sharing" target="_blank" title="External link to https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqAAlHnZqmuVmSFbHMLDsjaU/edit?usp=sharing">Google online spreadsheet</a>. Here is a cut and paste of the CPU measurements for just the 2 queries I am interested in:<br />
<br />
<table class="twtable"><tbody>
<tr class="evenRow"><th>Metric</th><th>TABLE BUFCACHE</th><th>TABLE INMEMORY PRED</th></tr>
<tr class="oddRow"><td>task-clock-ms</td><td align="right">27374</td><td align="right">1578</td></tr>
<tr class="evenRow"><td>cycles</td><td align="right">86428653040</td><td align="right">4573793724</td></tr>
<tr class="oddRow"><td>instructions</td><td align="right">32115412877</td><td align="right">7080326242</td></tr>
<tr class="evenRow"><td>branches</td><td align="right">7386220210</td><td align="right">940579984</td></tr>
<tr class="oddRow"><td>branch-misses</td><td align="right">22056397</td><td align="right">4637243</td></tr>
<tr class="evenRow"><td>stalled-cycles-frontend</td><td align="right">76697049420</td><td align="right">2251325295</td></tr>
<tr class="oddRow"><td>stalled-cycles-backend</td><td align="right">58627393395</td><td align="right">1328333827</td></tr>
<tr class="evenRow"><td>cache-references</td><td align="right">256440384</td><td align="right">11507915</td></tr>
<tr class="oddRow"><td>cache-misses</td><td align="right">222036981</td><td align="right">7316366</td></tr>
<tr class="evenRow"><td>LLC-loads</td><td align="right">234361189</td><td align="right">9712269</td></tr>
<tr class="oddRow"><td>LLC-load-misses</td><td align="right">218570294</td><td align="right">7272805</td></tr>
<tr class="evenRow"><td>LLC-stores</td><td align="right">18493582</td><td align="right">1697666</td></tr>
<tr class="oddRow"><td>LLC-store-misses</td><td align="right">3233231</td><td align="right">27797</td></tr>
<tr class="evenRow"><td>L1-dcache-loads</td><td align="right">7324946042</td><td align="right">1069917316</td></tr>
<tr class="oddRow"><td>L1-dcache-load-misses</td><td align="right">305276341</td><td align="right">85368159</td></tr>
<tr class="evenRow"><td>L1-dcache-prefetches</td><td align="right">36890302</td><td align="right">25169253</td></tr>
</tbody></table>
<br />
The most obvious observation is that the In-Memory query execution is 17.34 times faster than the traditional buffer cache query execution (elapsed <code>task-clock-ms</code> time ratio of 27374 / 1578). I'm most interested in trying to explain what has caused this difference in elapsed time.<br />
<br />
<h2>
My Hypothesis & CPU / Memory Access</h2>
At
the beginning of this post I stated that I assumed that the main
performance improvement would be a result of a reduction in the number
of execution stalls inside the CPU. Let me try and explain what I mean
by that.<br /><br />In the very old days of Intel CPU's (from the original
8086 up to the first '486 I think) the internal clock of the CPU (the
rate at which it executed instructions) was the same as the external
clock of the system which also governed memory access times. A read
from the CPU of a memory location would take something like 3 or 4
external clock cycles (put the address on the system bus, the memory to
process that to its output, put the data out on the bus to the CPU).
During these clock cycles the CPU cannot proceed with the execution of
that instruction i.e. it stalls until the data it needs from memory has
been read into the CPU.<br /><br />During the '486 lifetime Intel introduced
technology that allowed the internal clock of the CPU to run faster
than the external clock i.e. at a multiple of the external system clock.
Now we have external clocks of 100 MHz typically and CPU internal clocks of 3 GHz and more i.e. the CPU is running internally at 30 or more times the external system clock rate.<br /><br />A
side effect of this change is that now when a CPU "stalls" to read
needed data from memory, the stall is for much longer in terms of CPU
instruction cycles. As the external clock is now 30 or more times
slower than the internal clock of the CPU, the CPU may end up waiting
doing nothing for over 100 execution cycles. NB. This is a
simplification of what really happens, but it does describe a real
effect.<br /><br />We can see this effect in the separately reported
statistics for "CPU cycles" and "CPU instructions completed" - the first
one will be much higher than the second one. The "efficiency" of how a
CPU executed a given piece of program code can be seen from the
calculated "instructions per cycle" value i.e. "CPU instructions
completed" divided by "CPU cycles". The closer this value is to 1 the
more efficient the execution has been within the CPU. The lower this
value is then the more "stalls" occurred within the CPU during execution
that stopped it completing execution of an instruction every clock
cycle.<br /><br />Of course CPU designers have tried to work around or
reduce the effect of such stalls in various ways, the details of which
are not relevant here. It just means that a) there is various
technology in the CPU to try and minimise the occurrence and impact of
such "memory load stalls", and b) the net impact of such "memory load
stalls" is much less than the 100 or so wasted execution cycles I've
described. In fact, and this is also important, modern CPU's can
actually issue more than one instruction at the same time for execution
within the same clock cycle e.g. an integer operation and a floating
point operation. This can result in an "instructions completed per
clock cycle" value larger than 1, which can occur when executing well
optimised program code.<br /><br />If my hypothesis is correct then we
should see a difference in the number of "stalls" in the CPU execution
statistics between the 2 SQL query executions, and the ratio between
these 2 sets of "stall" statistics should be close to the observed ratio
in execution times.<br />
<br />
<h2>
My Initial Conclusions</h2>
To cut to the
chase a bit, the observed difference in execution times between the 2
SQL queries is not wholly explained by just a reduction in the number of
"stalls" inside the CPU, but the reduction in "stalls" <strong>is</strong>
a significant contributor to this reduction in elapsed time. The
reduction in elapsed time is roughly a 50:50 split between a reduction
in the total number of instructions executed (code efficiency) and CPU
memory access stalls (memory access efficiency).<br /><br />We can see from
the measurements that Tanel has reported that the number of instructions
executed by the CPU decreased by a factor of 4.54 (32115412877 /
7080326242) to process the same amount of data (rows in the table).
This is a significant improvement in code efficiency to process this
volume of data - probably a result of the In-Memory cache using far simpler internal data structures than the traditional buffer cache that can in turn be processed much easier.<br /><br />This
leaves an improvement factor of 3.82 (17.34 / 4.54) that needs to be
explained by something else. I propose that this other factor is due to
a reduction in the number of "stalls" within the CPU when executing
these instructions as described earlier. This will have the net effect
of increasing the "instructions per cycle" value during the execution of
the SQL query, and indeed we can see this from the measurements
reported by Tanel:<br /><ul>
<li>Buffer cache SQL query execution has CPU instructions per cycle of 32115412877 / 86428653040 = 0.37</li>
<li>In-Memory cache SQL query execution has CPU instructions per cycle of 7080326242 / 4573793724 = 1.55</li>
<li>This gives an improvement ratio of 1.55 / 0.37 = 4.17</li>
</ul>
And this value is very close to the 3.82 value for the other performance improvement factor.<br /><br />Trying
to drill down further into this difference in "instructions per cycle"
and identifying "CPU memory stalls" due to data access is difficult if
not impossible for a number of reasons:<br /><ul>
<li>Above the L1 cache on
the CPU the other CPU caches are shared by both instructions and data.
We cannot tell which cache reads and which cache misses were due to
fetching an instruction to execute or fetching a piece of data needed by
an instruction being executed. Both lead to CPU stalls but for
different reasons.</li>
<li>The In-Memory cache SQL query is executing a
completely different code path within Oracle with almost 78% fewer
instructions being executed in total (100 / 4.54). So we cannot assume
that the instruction execution profile and also instruction cache misses
are similar in both cases.</li>
</ul>
We can however see other evidence
that this other factor for the improvement in performance is due to a
reduction in CPU stalls. Tanel has reported two other CPU execution
statistics - "<code>stalled-cycles-frontend</code>" and "<code>stalled-cycles-backend</code>". Tanel gives descriptions for these in his <a class="externalLink" href="http://blog.tanelpoder.com/2015/09/21/ram-is-the-new-disk-and-how-to-measure-its-performance-part-2-tools/" target="_blank" title="External link to http://blog.tanelpoder.com/2015/09/21/ram-is-the-new-disk-and-how-to-measure-its-performance-part-2-tools/">part 2 post</a>, which I will summarise as follows:<br /><ul>
<li>Stalled
cycles front end occurs before an instruction gets executed by the CPU
and is generally due to fetching an instruction from memory</li>
<li>Stalled
cycles back end occurs during instruction execution and is generally
due to fetching data from memory needed by that instruction</li>
</ul>
So
the most relevant statistic for stalls due to reading data from memory
is "stalled-cycles-backend". We cannot compare these directly between
the two SQL query executions because they executed a different number of
instructions (remember the 4.54 reduction factor). What we can do is
normalise these to a "stalled cycle per cycle" value and then compare
them:<br /><ul>
<li>Buffer cache SQL query execution has backend stalls per cycle of 58627393395 / 86428653040 = 0.68</li>
<li>In-Memory cache SQL query execution has backend stalls per cycle of 1328333827 / 4573793724 = 0.29</li>
<li>This gives an improvement ratio of 0.68 / 0.29 = 2.34</li>
</ul>
Again,
this is a significant improvement and shows that the number of stalls
due to memory accesses by the CPU for data has been reduced.<br />
<br />
<h2>
Why Have Stalls Reduced?</h2>
I hope that I have shown that part of the improvement in the performance of the In-Memory
cache has been the reduction in the number of "memory access stalls" in
the CPU when executing the SQL query i.e. less time wasted waiting to
read data in from main memory with a corresponding increase in the
number of instructions executed per clock cycle and less total elapsed
to execute those instructions. But how has Oracle actually managed to
achieve this? How has it managed to reduce the number of "memory access
stalls" while executing the same SQL query i.e. if it needs to read the
same "set of data" to perform the same SQL query, how has it managed to
read it more efficiently?<br /><br />I am guessing that this is due to two specific factors:<br /><ol>
<li>Data layout in memory - the layout is more "compact" which decreases the total amount of memory that must be read</li>
<li>Sequential
access to that memory, causing the CPU to "prefetch" the next memory
line into the CPU cache, so it is already there when it is needed i.e.
overlapping the processing of one memory line with the fetching of the
next memory line</li>
</ol>
Note that this is based on what I have read on In-Memory
as I've not yet had a chance to do any real testing of it yet. Again,
thanks to Tanel for both doing some real testing and for sharing all the
results he got from those tests.<br /><br />In the traditional buffer cache
a data block will span multiple memory lines (assuming a 64 byte memory
cache line, as Tanel states) and this will contain data for all the
columns in a row. When you only need the data from one column it means
that you are skipping about in memory quite a lot, jumping from row to
row, and not using most of the data in each block or memory line. In
the new In-Memory cache the data is stored by column, so all the data
for one column is stored together, next to each other in memory. This
immediately reduces the total amount of memory that must be read to
access the data for one column across all the rows in the table.<br /><br />It
also will increase the cache efficiency within the CPU, because one
memory line in the cache will contain data for multiple rows i.e. the
CPU can process the data for that column from multiple rows before it
needs to read in another memory line into cache to get the next set of
rows' data.<br /><br />If the data for a column is physically laid out
sequentially and contiguously in the system's memory, then it means that
Oracle will be doing a series of sequential memory reads to process all
the data for one column. Modern CPU's can detect this kind of
sequential memory access and have an optimisation to "prefetch" the next
data line from memory while the previous data line is still being
processed. The net result is that stalls are either eliminated because
the next line of data is already in the CPU cache or the impact of
stalls is significantly reduced because the data is already on its way
from memory to the CPU cache.<br /><br />We can see evidence for this occurring in the "<code>L1-dcache-prefetches</code>"
CPU statistic. The "L1 data cache" is the first on chip cache
immediately used by the CPU core, and is small but very fast. All data
needed by the CPU goes through this cache. Again, normalising to a
"prefetch per instruction executed" we can see that the number of data
prefetches by the CPU increased significantly during the In-Memory cache SQL query execution:<br /><ul>
<li>Buffer cache SQL query execution has L1 dcache prefetch per instruction executed of 36890302 / 32115412877 = 0.00115</li>
<li>In-Memory cache SQL query execution has L1 dcache prefetch per instruction executed of 25169253 / 7080326242 = 0.00355</li>
<li>This gives an improvement ratio of 0.00355 / 0.00115 = 3.09</li>
</ul>
<br />
<h2>
The End</h2>
Well
that's been a longer post than I wanted, but I wanted to get everything
together into one post as I'm sure Tanel will finish his series of
posts soon and probably make similar observations. And I wanted to see
if he would come to exactly the same conclusions or slightly different
ones, and doing this post was the only way to do that. Thanks again to
Tanel for doing the tests and sharing all the data with us.John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0tag:blogger.com,1999:blog-7302956329008495023.post-61522283453797663972015-12-14T11:06:00.000+00:002016-11-25T14:51:35.084+00:00Python for the DBA (2) - Data UnloadingI've mentioned before that I like the <a class="externalLink" href="http://www.python.org/" target="_blank" title="External link to http://www.python.org">Python</a>
programming language. Not only is it a "clean" language for writing
programs in with a rich set of data types and structures for
manipulating data, it also has a standard API for database access making
it "database neutral". This makes it relatively easy to get data that
is in a database into and out of your Python code where you can use the
power of Python for whatever kind of data manipulation you need.<br /><br />To
show how easy it is to get data into a Python program I'll show a
little utility to unload data from one table in a database to a flat,
text file as one record per line with each data field separated by
special delimiters.<br /><br />Assuming the following:<br />
<ul>
<li>Your Python program already has a database connection<ul>
<li>This is relatively easy, but does vary by specific database driver being used</li>
</ul>
</li>
<li>And it has opened the output file to unload to i.e. an open file handle and not the file name</li>
<li>And it has the name of the table, the field delimiter characters and the record terminator characters in local variables</li>
</ul>
Then you can write a simple function to call that will unload the data from that table to the specified file:<br />
<pre class="source-code"><code>def unload_table_to_file (db_conn, table_name, out_data_file, field_separator, record_terminator) :
# Preparation steps before doing the real work
# Replace literals of special characters in the separator and terminator
record_terminator = record_terminator.replace ('\\n', '\n')
field_separator = field_separator.replace ('\\t', '\t')
# Simplest SQL to get the data, assume column ordering is correct
select_SQL = "select * from " + table_name
# Now do the work - read in all records and output them
print ("=== Unloading . . .")
select_cursor = db_conn.cursor ()
select_cursor.execute (select_SQL)
for data_row in select_cursor.fetchall () :
# Convert Python native data types to simple string representations
output_fields = []
for data_field in data_row :
output_fields.append (str(data_field))
out_data_file.write (field_separator.join (output_fields) + record_terminator)
print ("=== Finished")
</code></pre>
The main lines are:<br /><ul>
<li>First make sure that if any "escaped characters" are in the delimiters then they are the real escape characters<ul>
<li>"<code>\t</code>" means the TAB character, and "<code>\n</code>" means the NEWLINE character</li>
</ul>
</li>
<li>We just use a "<code>select * from table</code>" to get all the data into the Python code<ul>
<li>The String concatenate operator is just the "<code>+</code>" operator i.e. "<code>+</code>" is overloaded depending on data type</li>
</ul>
</li>
<li>And we execute this "select" via a Cursor created from the database connection passed in</li>
<li>The real work happens in 2 nested loops - for each data row, for each data field in each data row<ul>
<li>The fields from one record are put into a list (essentially equivalent to an array)</li>
<li>This
is so we can use a single action at the end to concatenate them all
together with each data field value separated by the "field separator"</li>
<li>Then we "write" out this concatenated list of fields to the output file</li>
</ul>
</li>
<li>The "<code>str</code>" function will return the String representation of a data value e.g. convert an integer to its String representation<ul>
<li>This assumes that there exists a data type specific "<code>str</code>" method for each possible data type</li>
</ul>
</li>
<li>The "<code>join</code>"
method of a String variable iterates over a list of data, and returns
one String of the list contents concatenated together with the value of
the String variable between each one.</li>
</ul>
I think this shows some
of the power and elegance of Python that reading all the data from a
table can be done with so few lines of code, and that the code itself is
so readable. And this code is flexible enough to deal with different
values for the field separator characters and the record terminator
characters. You could just do a classic comma separated list ('<code>,</code>' as the field separator) or something more complicated to avoid issues where a comma may appear in the data itself e.g. "<code>|__|</code>"
as the field separator (vertical bar, underscore, underscore, vertical
bar). And this flexibility is all handled by this Python code.<br />
<br />
<h2>
Caveats</h2>
This
is probably the most simple Python code to achieve the goal of reading
all the data in a table and writing out to a flat, text file. There are
several things it does not do at all, and other things that could be
enhanced with it.<br /><ul>
<li>It is assumed that all data types being used can be cleanly converted to valid String representations using "<code>str()</code>"<ul>
<li>This may not be true for some data types, or you may wish to handle some data types differently e.g. Dates</li>
</ul>
</li>
<li>Dates should probably be handled differently, to ensure the full date and time values are output<ul>
<li>You could execute an "<code>ALTER SESSION</code>" SQL statement to set the value of <code>NLS_DATE_FORMAT</code> to something suitable</li>
<li>Python has a native Date / Time data type, "<code>datetime</code>", which Oracle <code>DATE</code> columns are mapped to, and you could test for this</li>
</ul>
</li>
<li>You could add extra functionality such as debugging / tracing messages (use the "<code>logging</code>" module) or other progress messages<ul>
<li>You may wish to add extra counters to count the number of records and fields per record</li>
</ul>
</li>
</ul>
John Bradyhttp://www.blogger.com/profile/08719865814813032071noreply@blogger.com0