2011-11: Index Compression – Which ones are worth it?

Before DB2 9 index spaces were stuck at being 4K and then in DB2 9, the full range of page sizes was opened up – 8k, 16k and even 32k pages. Knowing the classic answer to any DB2 question is “It Depends” and for index compression that is,still the correct answer!

The PROs for index compression are:

    • Less disk space

 

    • Fewer levels in the index

 

  • Faster I/O

Of course, the world is not really that simple. There are also CONs for index compression:

    • Increased CPU

 

    • Worse I/O response time

 

    • Increased bufferpools (more bufferpools, not just increasing the sizes of existing ones)

 

    • Real memory

 

  • Standards

Also bear in mind that you do not *have* to use compression to use the larger spaces. In fact, just switching to a larger page size can reduce index splits and index levels without the CPU and bufferpool tradeoffs that you must accept with compression.

If you’re interested in compressing indexes, here are a few quick facts about how it works:

    • The compression is done purely on disk. In the bufferpool, on the log, etc., it will always be uncompressed.

 

    • The compression is done at the “page” level.

 

    • The compression is only on the LEAF pages (it uses the old VSAM methodology).

 

  • The index *must* be defined with either 8k, 16k, or 32k as these will then be compressed down to a 4k page on disk.

How to choose which index to compress? There are these simple rules:

  1. Only compress big indexes when it is worth it. (Rule of Thumb: Between 48k and 1536k is *not* worth compression).
  2. A good candidate is an index with “normal” sequential access, because there is less I/O.
  3. A bad candidate is an index with “normal” random access having a bad bufferpool hit ratio and you are CPU limited, as you will bring more data than needed into the bufferpool.
  4. Run the DSN1COMP job and look at its results to chose the best option.

There are two Rules of Thumb that you may want to keep in mind:

  1. Normally 8k is the best choice as 16k can result in too much bufferpool space being wasted  (refer to the 1st example below).
  2. Use 8K if 50% or more compression, unless you get 75% or more when you can indeed use 16k (refer to the 2nd example below).

I have never seen a 32k space worth using at all. However, remember I have not tested every possible key length and data mix in the whole universe!

Txt File contains typical JCL for DSN1COMP. Note the use of PARM LEAFLIM(10000). This specifies how many index leaf pages should be evaluated to determine the compression estimate. This option prevents DSN1COMP from processing all index leaf pages in the input data set that are pretty large by definition.
The range is from 1 to 99000000. If the LEAFLIM PARM is not specified, the entire index will be scanned.

The output from DSN1COMP shows the results from evaluating the compression with different index page sizes. Look for message DSN1940I to see the details of the compression report; see the following two examples:

Example 1:

  8  K Page Buffer Size yields a     
 51  % Reduction in Index Leaf Page Space                          
     The Resulting Index would have approximately
 49  % of the original index's Leaf Page Space
     No Bufferpool Space would be unused 
     ----------------------------------------------
 16  K Page Buffer Size yields a                                 
 68  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 32  % of the original index's Leaf Page Space
 21  % of Bufferpool Space would be unused to
     ensure keys fit into compressed buffers
     ----------------------------------------------
 32  K Page Buffer Size yields a 
 68  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 32  % of the original index's Leaf Page Space
 60  % of Bufferpool Space would be unused to 
     ensure keys fit into compressed buffers

Here you can quickly see that the 8k wins even though it gives less saved space (51% instead of the 68% of the 16k). The Bufferpool will not be unused a.k.a. wasted. The following shows another output where the 16k page wins:

Example 2:

  8  K Page Buffer Size yields a
 51  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 49  % of the original index's Leaf Page Space
     No Bufferpool Space would be unused
     ----------------------------------------------
 16  K Page Buffer Size yields a 
 76  % Reduction in Index Leaf Page Space 
     The Resulting Index would have approximately 
 24  % of the original index's Leaf Page Space 
     No Bufferpool Space would be unused
     ---------------------------------------------- 
 32  K Page Buffer Size yields a            
 76  % Reduction in Index Leaf Page Space        
     The Resulting Index would have approximately   
 24  % of the original index's Leaf Page Space 
 47  % of Bufferpool Space would be unused to 
     ensure keys fit into compressed buffers

Important is the right balance between space savings, buffer pool usage, and CPU to decide on the best page size for your indexes. This brings us to the downside of all this – Standards. Agree on and write down which index gets what page size and verify this. In shops where the DDL is generated by many DBAs or tools, the “compress index methodology” must be documented and understood by everyone; otherwise, it will go horribly wrong at some point!

Monitoring and tuning bufferpools must also be done/redone with compression and your need of real memory may go up as well as your CPU (a little bit).

There is a great red paper that also describes the whole process in great detail. One nice little snippet of info about indexes that came in with DB2 9 was  the asymmetric page split. Before DB2 9, when you were inserting within a range (not just inserting at the end of a table!), the page splits were always 50/50, and the first page’s space was then almost always never used until the next index REORG came along. In DB2 9, this changed, and the engine could detect that this insert processing was happening and then changed the split to be adaptive and even up to 90/10; thus, drastically reducing the number of splits.

In DB2 9, index look aside was enabled for CLUSTERRATIOF >= 0.80 indexes; not just for the CLUSTERING index. Also in DB2 9, RANDOM indexes were delivered but I have never seen one in the wild – so to speak.

In DB2 10, we now have the excellent usage of INCLUDE columns to eliminate  redundant indexes that were only created for index only access. The point is that every index that exists on a table adds to the “overhead” of INSERT and DELETE, as well as disk space. Not to forget the poor little Optimizer can get a headache trying to work out the best access path with all the data there is to read and process! This overhead can get quite large and reducing the number of indexes should always be a goal for the DBA group; refer to my earlier Newsletter about RTS to see how I go hunting for redundant indexes.

Below is a little table from the IBM labs detailing the CPU usage of different INSERTs:

Scope                             9672-Z17 CPU time
No Index                          40 to 80µs
One index with no index read I/O  40 to 140µs 
One index with index read I/O     130 to 230µs 
Five indexes with index read I/O  500 to 800µs

Feel free to send me your  questions or comments.
TTFN,
Roy Boxwell
Senior Architect