This post is about experiencing "
table fetch continued row
"
in compressed data warehouse tables that experience some level of
updates to existing (old) data, and how using the Advanced Compression
option of Oracle Enterprise Edition not only does not help in avoiding
this but can actually introduce severe performance penalties into your
daily load jobs.
Background
Basic Compression is a free
part of Oracle Enterprise Edition, allowing you to compress the data
within a table, and is useful for large databases such as a Data
Warehouse. Not only will compression reduce the total disk storage used
by that data in the table, it can also improve query performance in
various ways - full table scans will read less disk blocks, and
potentially more rows can be cached in the buffer cache in the SGA
within the same number of blocks. One downside to Basic Compression is
that it can only be used to compress existing data in a table or
partition (done by "moving" that table or partition within the same
tablespace). Newly inserted rows or updated rows are not compressed
when added to a data block, and are added in their normal uncompressed
form. This can mean that over time a compressed table or partition can
become less compressed due to updates and newly inserted rows.
Oracle's
official solution to this is the extra Advanced Compression Option with
the OLTP Compression feature. OLTP Compression
will
compress the data of newly inserted rows into a table where it has been
configured. It basically does this by inserting the rows in the normal
uncompressed format (no performance penalty for these inserts) until
the block becomes full, whereupon it then compresses that one block
(performance penalty for that particular insert), which will also
increase the free, unused space in that block for yet more rows to be
inserted. The assumption is that overall for an average / normal OLTP
application the frequency of block compressions will be low, and the
impact on most transactions will be negligible (assuming they SELECT and
INSERT on many other tables within the same business transaction).
A
"perfect ideal" Data Warehouse would only ever load new data into the
current partition of a partitioned table. Existing data would never be
updated, and new data would never be inserted into a previous, old
partition. In such a scenario you have two potentially straightforward
strategies for compressing your data using only Basic Compression:
- Wait until the end of a period when the database will switch to the next partition, and then compress the previous partition
- As above plus re-compressing the current partition at a set frequency during the period in which it is the current partition
The
first strategy means that all old partitions have the data in them
compressed, and so take up minimal disk storage. However, the data in
the current partition is not compressed at all. The old partitions only
ever need to be compressed once, because this is an ideal scenario and
that old data is never updated or extra data inserted into it.
The
second strategy is the same as the first for old partitions, but also
compresses the current partition at scheduled intervals to gain some of
the benefits from compression on the most recently loaded data. How
frequently you compress the current partition depends on various
factors, such as the time to do this compression and the time available
each night for the daily data load into the data warehouse. As the
partition grows in size with more data being added to it, so the
compression will take longer each time.
The Compressed Row Update Problem
Both
strategies are straightforward and only use the features of Basic
Compression. However, real world data warehouses are not "perfect" or
"ideal" - old data can arrive after a period has ended and must be
loaded into old, existing partitions, and corrections can come through
for old data needing updates to the existing data. If your data volumes
are very high, both within each partition and for the number of rows of
old or corrected data, then your compressed partitions can become less
compressed as a result and can grow in size.
As well as an
increase in the disk storage used, there is also a potential performance
penalty to pay for such data changes in old partitions due to the way
the update physically happens within the data block. When the
compressed row is updated it will not fit back into its original slot in
the data block, because it will have grown in size one way or another.
This means that Oracle will try and move that row to the end of the
data block into the free, unused space area as reserved by the "
PCTFREE
" setting for that table. However, Basic Compression sets "
PCTFREE
"
to zero by default, so there is nowhere for such updated rows to go
within its current data block. What Oracle will do is "insert" that row
into a new data block (as used for normal inserts), and leave behind a
forward pointer in the block where the row used to be. This is termed
"row migration" - the row's physical storage has migrated to another
block within the table.
This row migration has a negative impact
on index based reads, because the index structure has the ROWID of each
data row in its leaf blocks, and the ROWID is really a map to the
physical location of the data row - it specifies things such as the data
file, block and offset within the block within the encoding in the
ROWID. When a query using such an index goes to read a migrated data
row it first reads the original data block as indicated by the ROWID and
rather than the expected data row instead finds a forward pointer
(another ROWID) to where the row has now been stored. So the query must
issue a second read against the second data block in order to get the
data row. Thus migrated updated rows cause extra disk reads when
accessed through an index. Such "second, additional disk reads" are
visible in the Oracle statistic of "
table fetch continued row
" i.e. you can see how many table row fetches cause an additional fetch due to migrated data rows.
Advanced Compression
In such scenarios you
might
assume that using the OLTP Compression feature of Advanced Compression
would solve this. You would hope that it would keep the data in the old
partitions compressed for you without any explicit action needed from
you, and avoid or minimize the number of migrated rows. Unfortunately
this is not true, and in fact using OLTP Compression will probably
result in both a severe performance penalty during the daily data load
and have no positive impact on updated rows being migrated. Which
ultimately means that there is little to no point in using OLTP
Compression on most data warehouses. Of course, there may be some
scenarios where it might be of some benefit, but for the vast majority I
believe it will only have a negative impact overall.
The first
performance penalty is that it has a direct slow down on the INSERT's
being executed by the daily data load jobs as the blocks are compressed
as they fill up. This is to be expected of course, and you would hope
that the performance penalty was relatively small and would ensure that
all blocks in a table were compressed.
Unfortunately there is a second and more substantial performance penalty arising from this, because each block
will be compressed multiple times
during the data load. And this repeated compression of each block
multiplies up the total CPU time cost used during the daily data load to
the level where it can more than double the elapsed time of the data
load jobs. This is a consequence of the way OLTP Compression works.
Compression is triggered each time a block "becomes full", which will
happen multiple times during the data load process. After a block is
compressed for the first time, it will now have free unused space in it,
into which new, subsequent rows can be inserted. So the same block
will repeatedly fill up, compress, fill up again, compress, fill up
again, compress, until eventually it is full of only compressed rows and
another row cannot fit in the unused, free space.
And the third
performance penalty is that the frequency of such block compressions is
much higher with longer, larger data rows, such as typically found in a
data warehouse. Longer rows will fill up a block sooner, triggering a
compression sooner and more often as fewer rows are needed each time to
consume the remaining free space i.e. the average number of rows
inserted between a compression occurring gets lower as the row size
increases. And the more compression operations that happen during a
data load results in much more CPU time being consumed for a given
number of rows. See the end of this post for the detail of the maths
behind this.
This triple whammy combination - every
INSERT
slows down, blocks are compressed multiple times during the load,
longer rows lead to more frequent block compressions - can cause data
load times to double or triple in elapsed time. The precise level of
the performance impact depends on various factors such as the rows
sizes, the level of compression achievable, and the CPU's in your
computer system.
Conversely Basic Compression only ever
compresses each block once, so the total CPU used by it is far lower
than for Advanced Compression. And because the compression operation
takes place separately, there is no direct impact on the daily data
loads themselves i.e. they continue to take the same elapsed time as
normal.
But that is not the end of the negative impacts of using OLTP Compression in a data warehouse. The other big negative is that
it does not actually solve the original problem you were experiencing - updated rows being migrated. Under OLTP Compression, only an
INSERT
will trigger a compression of a block, never an
UPDATE
. This has been shown by various people:
So OLTP Compression doesn't help at all with the original scenario -
UPDATE
's causing row migration with additional "
table fetch continued row
"
block reads, and increased storage. Your data warehouse will still
have migrated rows from updates, and an increase in storage and disk
reads as a result. The only viable way to force migrated rows to move
back and be compressed is to use Basic Compression.
And finally,
for completeness, OLTP Compression typically achieves a slightly lower
compression ratio than Basic Compression. This is because they both use
the same compression mechanism, and OLTP Compression leaves the value
of
PCTFREE
at the table / partition level setting unchanged (10% by default) whereas Basic Compression sets
PCTFREE
to 0 by default during the compression, so achieving more compressed
rows per block. The difference in compression level will be very small
and minor, but it is there nevertheless.
Conclusion
Although
Basic Compression is not "perfect", for a data warehouse it is by far
and away a better choice than trying to use OLTP Compression of the
Advanced Compression option. The performance penalties from the
significant increase in CPU usage during data loads when using OLTP
Compression are very severe. And ultimately it does not even address
the original problem we were concerned about -
UPDATE
's causing row migration into different data blocks, resulting in additional "
table fetch continued row
"
block reads from queries needing those rows. And that's why sticking
with the free Basic Compression used on a scheduled basis after a data
load can be much more efficient for a data warehouse, combined with a
well designed partitioning strategy.
Appendix - Why Longer Rows Lead To More Frequent Compressions
I
stated earlier that when using Advanced OLTP Compression that longer
rows will both trigger block compressions sooner, and more frequently
i.e. more compressions on average per row (or conversely, fewer rows
inserted to a block on average between compressions). I will here
assume a simple 2:1 compression ratio between an uncompressed row and
its compressed form as it keeps the maths a bit simpler, but still shows
what is going on.
If you can fit 50 uncompressed rows into a
block then you can fit 100 compressed rows into the same block. As the
compression halves the size of the uncompressed rows, the block is
compressed multiple times when data is inserted into it. First 50 rows
are inserted, which triggers a compression leaving half the block free,
so 25 more rows can be inserted, triggering another compression, then 13
rows can be inserted, the block is compressed, etc. Until eventually
100 rows fit in, but another one will not.
This series of
halvings of how many rows can be loaded at a time is really like a sum
over the series of powers of the number 2. The last load will be 1 row,
the one before that of 2 rows, 4 before that, and so on. This means
that the block is "full" when 2 to a certain power is larger than the
number of rows that can fit in that block i.e. after that many
compressions another row will not fit in the block.
For 100 rows in a block it will be 2
7
(2 to the power 7) of 128 needed, which is 7 compressions. If each
block is compressed 7 times as it has rows inserted into it to achieve
100 rows in a block, then on average there is a compression every 100 / 7
= 14.29 rows.
For a much longer row that can only fit 10 uncompressed rows per block, then 20 compressed rows will fit in that block. And 2
5
is 32 (2 to the power 5) which is larger than 20, so 5 compressions
will be needed to fill each block. As 20 rows will have been loaded
into a block then a compression is occurring on average every 20 / 5 = 4
rows.
This is a
lot more frequent than
the previous case i.e. over 3 times worse on average! Thus OLTP
Compression on a data warehouse with larger rows leads to a
much larger CPU penalty
from all of the repeated block compression that will be occurring
during data loads. And why the alternative of just doing a normal load
followed by a one-off compression using Basic Compression can be much
more efficient.