2011-04: RTS – data for DBA daily usage

IBM brought in the RTS in DB2 V7 and it was a paradigm-shattering event! Well, at least for me…
What it did, was finally give us DBAs the ability to “see” what is happening in our DB2 sub-systems, without having to run an expensive RUNSTATS utility.

Up until this time, shops had to run 1000’s of RUNSTATS to see if a REORG was needed, and, of course, these RUNSTATS caused massive side effects:

  1. The DSC was invalidated for all RUNSTATSed objects
  2. Automatic REBIND detection software then scheduled a REBIND – which sometimes caused severe performance problems when the access path “went south” or “turned pear-shaped”
  3. Lots of CPU was wasted on non-changed data by RUNSTATS

 

The moment that RTS came into town it changed the whole game. Now you only need a RUNSTATS when you actually need a RUNSTATS and you only need a REORG when you actually need a REORG. (See my list of reasons why you do not actually need a REORG later in this newsletter). Just to complete the picture: You only need a COPY when you actually need a COPY! Lots of shops exploiting the RTS reported huge costs savings from using the real-time data for threshold based maintenance – if you haven’t yet considered the RTS then please do so, as it is a gold mine!

 

RTS tables from DB2 V8 up until DB2 10.

To start with, let us have a look at the RTS tables from DV8-10
As can readily be seen, IBM keeps adding columns to these tables!
Note that all the counter columns are NULLable. The way that the RTS data has been externalized, and which column(s) are updated  -and when – has changed a few times over the years. Now it seems to be very stable and the basic rule is:
When you first create an object (as long as it is not a DEFINE NO one of course!) it gets an entry in the RTS. The counters contain 0, but the relevant utility timestamp is NULL – which stops externalization of the counters (apart from REORGLASTTIME which is set to creation date).
When a RUNSTATS – or a COPY – has been done, then these counters start counting as well.

Now, just looking at the columns gives a great idea of the usefulness of this data: Upfront is a caveat, “The use of REORGLEAFNEAR is not recommended to decide upon a REORG or not”. The REORGLEAFNEAR is basically worthless as a data point. Recently, at a local DB2 user group meeting it was explained that the numbers in this column are really a waste of time and that, “IBM informed us that to use REORGLEAFNEAR in the REORG decision process was incorrect and could cause repetitive, needless, REORGs.” – You have been warned!! But apart from that, the data can help DBAs every day in their work.

 

REORG decisions:

Do not do a REORG
Simple huh? I basically see people all the time doing REORGs of TSs just because “The software told me to do it”. This is no excuse! The only real reason to REORG tablespaces is either to

a) reclaim space,

b) change attributes or

c) re-sort the data.

Now the last one really only makes sense if you do indeed have sequential access of the TS, otherwise, if the access is through an index and is effectively random, why bother with an expensive REORG???

 

RUNSTAT decisions:

Do not do a RUNSTATS
Equally simple! I see 1000’s of RUNSTATS conducted for absolutely no good reason. They kill your Dynamic Statement Cache (DSC), they can kill your access paths for REBINDs and BINDs, and what if you ran it when the table was empty but then along comes a LOAD of 2,000,000 rows??? Yup, a RUNSTATS run must be aware of the usage and environment and should be well timed. Basically, the rule is to run a RUNSTATS when the DB2 Optimizer benefits from the fresh statistics and generates a new, better access path. This may be a challenge if you don’t have the right database maintenance solution, but it’s another gold mine if you do!

 

COPY decision:

Do COPYs on-the-fly. Use SHRLEVEL CHANGE and run your ICs during the day. At 08:30 a.m. you have copied the world and your SLA’s for RECOVERY are great. What happens when you have a problem at 05:30 p.m.? I guess your SLA will *not* be met, right? What you must do, is learn to not be afraid of image copies during the day. They can save your life, but don’t do them on a fixed schedule. Do them when they’re needed!

At the INDEX level the RTS data can be used for the same set of Utilities as above.

 

IX REORG decisions:

The REORG of an index can bring real plus points! If REORGPSEUDODELETES is greater than 10% of TOTALENTRIES or REORGNUMLEVELS is greater than zero, then it is definitely worth doing an index REORG to clear out the junk and get the index back into tip-top condition. This should not change your access paths, if they were bound when the index was “fit”, but it will improve your elapsed time as the number of IOs will go down dramatically.

 

IX RUNSTAT decisions:

Do not do a RUNSTATS
Same as TS RUNSTATS, but with the added hint of never actually doing just an Index RUNSTATS on its own. If you do this, then the data stored in the DB2 catalog will start to “drift” apart and, eventually, even the DB2 Optimizer will “see” this and start to make bad decisions based on inaccurate data. If you can, always try to do a consistent full RUNSTATS at the TP or TS level. BTW, if you’d like to check your DB2 catalog for inconsistency, please try our Statistics HealthCheck Freeware. As the name says, it’s FREE! !
Also note that a RUNSTATS at the index level will only invalidate access paths in the DSC that actually currently use this index, so if the optimizer has picked a tablespace scan or another index, then just running a RUNSTATS index will not enable a new access path for existing statements – another reason to run at the TS or TP level!

 

IX COPY decision:

Do not do index COPYs on their own! Personally I am no fan of COPY YES indexes. The REBUILD INDEX utility is so fast these days, (I have examples of a 17,000,000 row table with 17 indexes and the rebuild took three minutes elapsed…). and in a data sharing system you can even bring the system crashing down around you if you do too many ALTERs at the same time, so be very careful! But, if you feel you must have COPY YES indexes to reduce CPU consumption during RECOVERY, then do the TS and the IX at the same time, in the same copy statement of course!

In the IBM manual “Performance Monitoring and Tuning Guide”, look at the chapter “Real-time Statistics” to see which thresholds IBM uses to determine when a utility should be run – and, of course, make sure you tailor them to suit your site, if you don’t have a set it and forget it solution like our RealTime DBAExpert! Solutions like that also manage the RTS. Orphans build up in the RTS and can cause problems, so every now and again make sure that only databases and tablespaces that actually exist are also in the RTS. Watch out for DSNDB01, Work Databases and the RTS itself, as the first two are *in* the RTS but the RTS is *not* in the RTS.

 

Finally, a quick run through some of my favourite columns in the RTS:

DBNAME & NAME – Remember that the DSNDB01 is in the RTS, but watch out for SYSLGRNX NACTIVE and do not forget that the RTS is not in the RTS, of course!
EXTENTS – This is the number of extents in the *last* linear dataset, not the total over all datasets.
INSTANCE – If using the CLONE system, then each CLONE has two entries in the RTS, so you must always also join to the INSTANCE column in the SYSIBM.SYSTABLESPACE to make sure you are looking at real data and not behind-the-scenes data – also for indexes, of course!
LASTUSED – A great place to begin the “unused index analysis saga”.

And, of course, the new DB2 10 fields, first the TABLESPACE ones:
REORGCLUSTERSENS – The number of times data has been read by SQL statements that are sensitive to the clustering sequence – in other words CLUSTER is important!
REORGSCANACCESS – The number of times data is accessed for SELECT, FETCH, searched UPDATE, or searched DELETE. Great for REORG decisions!
REORGHASHACCESS – The number of times data is accessed using hash access for SELECT, FETCH, searched UPDATE, searched DELETE, or used to enforce referential integrity constraints. Great for checking whether your usage of HASH Access is bringing dividends or not.
HASHLASTUSED – When it was last actually used. The docu states TIMESTAMP but I always see a DATE column here, just like normal index LASTUSED.
DRIVETYPE – Does this object live on a normal disk (HDD) or on a solid state drive (SSD)? Only for the first created object dataset.
LPFACILITY – Does the disk unit that this object lives on support the new High Performance List Prefetch facilty Y or N?
STATS01 – Some internally used data for IBM

 

And now the INDEX ones:

REORGINDEXACCESS – The number of times the index was used for SELECT, FETCH, searched UPDATE, searched DELETE, or used to enforce referential integrity constraints. Great for REORG decisions as, if this number is high, then a REORG may not bring much in performance terms. Extra bonus data is that for hash overflow indexes, this is the number of times DB2 has used the hash overflow index – which is a *BAD* thing and should be controlled!
DRIVETYPE – Does this object live on a normal disk (HDD) or on a solid state drive (SSD)? Only for the first created object dataset.
STATS101 – Some internally used data for IBM. I just hope it is nothing to do with the Orwellian room 101!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-02: XML Table Spaces – How they work and what you must take care of, or “LOBs the Next Generation”

 

XML has become more and more interesting for DB2 over the years.  Starting with DB2 9 it has also become much easier to use!

One of the weird things about XML is that the table spaces come in three (yes three!) different flavors depending on the base table space/table definition. Therefore, they are a bit “trickier” than a “normal” table space.

Our newsletter this month covers the relatively straight-forward rules for creating these.

 

Segmented table spaces:

If you create a segmented table space with an XML column type, DB2 will automatically:

  1. Add a column to the base table “DB2_GENERATED_DOC_ID_FOR_XML” BIGINT.
  2. Create an XML Partition by Growth (PBG) universal table space, (the name will be Xyyynnnn where yyy is the first three UTF-8 bytes of the base table space, and nnnn four numeric’s starting with 0000).
  3. Create an XML table in the above space. The name will be Xyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the base table name, if it already exists.  nnn is used as three numeric’s starting at 000 until a unique name is found.
  4. Create a document ID index on the implicit column DB2_GENERATED_DOC_ID_FOR_XML in the base table. The name will be I_DOCIDyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the base table name, if it already exists. nnn is used as three numeric’s starting at 000 until a unique name is found.
  5. Create a node ID index on each XML column in the base table. The name will be I_NODEIDyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the XML table name, if it already exists. nnn is used as three numeric’s starting at 000 until a unique name is found.

 

PBG spaces:

If you create a Partitioned by Growth (PBG) table space with an XML column type, DB2 does the same as above, but with a default DSSIZE of 4GB – also for *all* of the indexes.

 

PBR spaces:

If you create a Partitioned by Range (PBR) table space with an XML column, DB2 will create as many PBR XML table spaces for you with their associated indexes but with a possibly larger DSSIZE.
This can all lead to a spectacular growth in storage space. Imagine a 4096 partition space, which is then altered to have two XML columns…the mind boggles!

 

What does this boil down to?

  • When using XML spaces make sure your back-up and recovery plans work.
  • Coordinate space management proactively.
  • Use COMPRESS wisely. XML table spaces inherit the COMPRESS attribute or can be ALTERed and then REORGed to enable compression. They compress very well.
  • Monitor your 16k Bufferpool usage because this is always used by XML table spaces.
  • Watch out if you copy data from subsystem to subsystem! XML data tends not to copy across too well, (e.g., DSN1COPY does not support it) because the XML descriptive data in the XSR system tables tends to be “forgotten”…

 

New from IBM:

Take a look at IBM’s RedBook “Extremely pureXML in DB2 10 for z/OS, SG24-7915-00”, which is full of great info about the use and maintenance of XML data.

 

Life of an XML column example:

  • Create segmented table space XMLCUSTO.
  • Create table XMLCUSTOMER (columns CID BIGINT and INFO XML).
  • DB2 adds the column DB2_GENERATED_DOC_ID_FOR_XML BIGINT, and it creates table space XXML0000.
  • DB2 creates the XML table XXMLCUSTOMER containing the three columns:
     DOCID         BIGINT
     MIN_NODE_ID   VARBIN(128)    NOT NULL
     XMLDATA       VARBIN(15850)  NOT NULL
  • DB2 creates the unique index I_DOCIDXMLCUSTOMER on the base table with the following column:
    DB2_GENERATED_DOC_ID_FOR_XML

 

  • Finally, DB2 creates the clustering, not padded, non-unique index I_NODEIDXXMLCUSTOMER on the XML table with columns:
     DOCID
     XMLDATA

 

A table INSERT:

INSERT INTO XMLCUSTOMER VALUES (06001, XMLPARSE(DOCUMENT '
 <customerinfo><name>John Smith</name>
 <addr country="United States"><street>327 Smith Ave</street>
 <city>Glendale</city><prov-state>California</prov-state>
 <pcode-zip>91208-06001</pcode-zip></addr>
 <phone type="work">888-111-0001</phone></customerinfo>'));

 

A SELECT:  

SELECT CID
FROM BOXWELL.XMLCUSTOMER
WHERE XMLEXISTS ('//addr[city="San Jose"]' passing INFO)

You will also find useful information about XML basics, XML integration in DB2 9, 10 and XML index design in our presentation entitled “XML Episode 9 – the return of the hierarchical Empire –“.

 

If you would like to sample SQL to create and populate some example XML spaces, please contact us.

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-03: PBG – Recommended Usage and REORG Strategy

When IBM introduced us to the “Universal Table Space”, (UTS), they brought  in two distinct flavors:

  1. Range-Partitioned – which finally enabled segmented, partitioned data (hoorah!)
  2. Partition-By-Growth (PBG)

The advantages of UTS are many, including: better space management for VARCHAR style data, improved MASS DELETE performance, TS scans limited to   active segments, and immediate data reuse after delete.

Enabling the PBG option with the MAXPARTITIONS clause on the CREATE TABLESPACE was a wonderful addition to the world of DB2. Finally, you never had to “worry” about a data set running out of space. As soon as it  got to, say, 2GB a new partition was simply “added”, and the world carried on spinning. What a great idea! No more 64GB limit for a segmented space since a PBG can get to be 128TB.

Then the doubts set in…

How do I REORG this?
How do I recover this?
How do I manage this?

 

How do I REORG PBG?

Now you may be wondering, “What’s the problem with REORG?” Well, take a second to reflect on what a good old partitioned  space looked like at the physical level: lots of partitions with varying amounts of data in them and none of them going to a secondary linear data  set of course.

Now in the bright new world of a PBG: lots of partitions absolutely choc  full of data and none of them going to a secondary linear data set of  course.

See the problem now?
If you REORG a partition of a PBG, you run the very  real risk of not being able to fit your data back in the partition! This is known by the very technical term: “not a good thing”.

To try and get around the REORG problem, IBM then introduced an “opaque”  ZPARM called REORG_IGNORE_FREESPACE.  If set to YES, the REORG simply uses zero for the PCTFREE and FREEPAGE options when reorganizing a partition.  The hope being that, without adding space around newly inserted records, it would “fit back in the box.” However, we are all aware that it is easy  to unpack stuff, but pretty hard to pack stuff back into the original box.

 

How do I recover PBG?

The good news is: RECOVERY and COPY are no  problem for PBGs. All you have to do is take care of your current REORG thresholds and *always* REORG a PBG at the table space level – time to check your “home-grown” or 3rd party DB2 database maintenance routines! –  unless you are 101% sure that the data will actually fit back inside! Also bear in mind the DBA mantra: “Never REORG a table space unless you have sequential access, or you must resize it”.

 

How do I manage partitioned table spaces?

As with all partitioned table spaces, you must still manage the “limits of DB2”, but no longer the “am I reaching the maximum size of a partition?”.  Instead, you must check “am I reaching the MAXPARTITIONS value?”  I hope you have automatic DB2 space management software that does *both* of these checks for you now.  As always with DB2 – old stuff and problems disappear and new stuff and problems come in!

 

What else is not allowed?

Restrictions to partition-by-growth table spaces

Well, a quick look in the IBM documentation provides the following restrictions that apply to partition-by-growth table spaces:

  • The PART option of the LOAD utility is not supported.
  • The REBALANCE option of the REORG utility is not supported.
  • The default SEGSIZE value 32.
  • Table spaces must be DB2-managed (not user-managed) so that DB2 has the “freedom” to create data sets as partitions become full.
  • Table spaces cannot be created with the MEMBER CLUSTER option.
  • Partitions cannot be explicitly added, rotated, or altered. Therefore, ALTER TABLE ADD PARTITION, ALTER TABLE ROTATE PARTITION, or ALTER TABLE ALTER PARTITION statements cannot target a partition partition-by-growth table space.
  • XML spaces are always implicitly defined by DB2.
  • If the partition-by-growth table space is explicitly defined, the LOB table space for the first partition of the partition-by-growth table space  is defined based on the SQLRULES(DB2) or SQLRULES(STD). Any additional LOB  table space for the newly-grown partition in the partition-by-growth table  space is always implicitly defined by DB2, regardless of whether SQLRULES is in effect. Specification of the SQLRULES(DB2) or SQLRULES(STD) option does not affect the LOB table space for implicitly defined partition-by-growth table spaces.
  • A non-partitioning index (NPI) always uses a 5 byte record identifier (RID).
  • Partitioned indexes are not supported.

Quite a hefty list of restrictions there! But, if you can live with the above and you want exceptionally large segmented table spaces, then PBGs are the way to go. Remember the other DB2 DBA mantra: “It Depends”, since PBG spaces should not be used for every table space – at least not yet anyway!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-05: Virtual Indexes – General usage hints and tips

This month’s newsletter addresses a fantastic feature, which I bet you’ll – if you don’t already – exploit!

Just imagine you could give a new index a  “try” before you build it, or you could even pre-check if your  application’s performance will degrade if an index that you suspect isn’t used anymore (refer to the unused IX, but used stats dilemma!). This is  exactly what a virtual index can do for you!

 

IBM brought in the DSN_VIRTUAL_INDEXES table in DB2 9 and fitted it into DB2 V8 CM, but with the shorter name, DSN_VIRTUAL_INDEX. Since then it has been studiously ignored by nearly everybody!
This table is actually incredibly useful in the DBAs job it is just that IBM made it very difficult to fill, change and test the effectiveness of what you can actually do with it…

So how do you get one?

 

Check out the DSN910.SDSNSAMP(DSNTESC) member or the DSNTIJOS

First check out the DSN910.SDSNSAMP(DSNTESC) member or the DSNTIJOS where IBM supply the sample DDL for all the OSC tables.

 

The DSN_VIRTUAL_INDEXES looks like.

"TBCREATOR"             VARCHAR(128)  NOT NULL
"TBNAME"                VARCHAR(128)  NOT NULL
"IXCREATOR"             VARCHAR(128)  NOT NULL 
"IXNAME"                VARCHAR(128)  NOT NULL  
"ENABLE"                CHAR(1)       NOT NULL      
                         CHECK("ENABLE" IN('Y','N')) 
"MODE"                  CHAR(1)       NOT NULL 
                          CHECK("MODE" IN('C','D'))
"UNIQUERULE"            CHAR(1)       NOT NULL 
                          CHECK("UNIQUERULE" IN('D','U'))
"COLCOUNT"              SMALLINT      NOT NULL
                          CHECK("COLCOUNT" > 0) 
"CLUSTERING"            CHAR(1)       NOT NULL  
                          CHECK("CLUSTERING" IN('Y','N')) 
"NLEAF"                 INTEGER       NOT NULL  
                          CHECK("NLEAF" >= -1)
"NLEVELS"               SMALLINT      NOT NULL 
                          CHECK("NLEVELS" >= -1) 
"INDEXTYPE"             CHAR(1)       NOT NULL WITH DEFAULT 
                          CHECK("INDEXTYPE" IN('D','2')) 
"PGSIZE"                SMALLINT      NOT NULL 
                          CHECK("PGSIZE" IN(4, 8, 16, 32)) 
"FIRSTKEYCARDF"         FLOAT        NOT NULL WITH DEFAULT -1
                         CHECK("FIRSTKEYCARDF" = -1 
                            OR "FIRSTKEYCARDF" >= 0)
"FULLKEYCARDF"          FLOAT        NOT NULL WITH DEFAULT -1
                          CHECK("FULLKEYCARDF" = -1 
                             OR "FULLKEYCARDF" >= 0)   
"CLUSTERRATIOF"         FLOAT        NOT NULL WITH DEFAULT -1
                          CHECK("CLUSTERRATIOF" = -1 
                             OR "CLUSTERRATIOF" >= 0) 
"PADDED"                CHAR(1)       NOT NULL WITH DEFAULT
                          CHECK("PADDED" IN(' ','Y','N')) 
"COLNO1"                SMALLINT     
                          CHECK("COLNO1" IS NULL 
                             OR "COLNO1" > 0)
"ORDERING1"             CHAR(1)    
                          CHECK("ORDERING1" IS NULL 
                             OR "ORDERING1" IN('A','D')) 
.
.
.
"COLNO64"               SMALLINT                     
                          CHECK("COLNO64" IS NULL  
                             OR "COLNO64" > 0) 
"ORDERING64"            CHAR(1)            
                         CHECK("ORDERING64" IS NULL 
                             OR "ORDERING64" IN('A','D'))

 

As can be seen there are an awful lot of columns that all have to be filled in correctly to get it to work, and lots of columns have got CHECK  rules.

The contents of the columns is fully described in the “Performance Monitoring and Tuning Guide” however the INSERTs, SELECTs, DELETEs and UPDATEs must be coded by yourselves…(Hint: Use the DRAW REXX to generate these statements for you – If you do not have the DRAW REXX then contact us for a copy).

 

Here is an IBM example INSERT:

INSERT INTO DSN_VIRTUAL_INDEXES 
 (TBCREATOR, TBNAME, IXCREATOR, IXNAME, ENABLE, MODE, UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, INDEXTYPE, PGSIZE, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED, COLNO1, ORDERING1, COLNO2, ORDERING2, COLNO3, ORDERING3)
 VALUES
 ('SAPR3', 'AUSP', 'SAPR3', 'AUSPTEST', 'Y', 'C', 'D', 7, 'N', 130000, 5, '2', 4, 1, 3693479, 0.95, 'N', 1, 'A', 6, 'A', 3, 'A');

As the Index columns are NULLable, just ignoring them all in the INSERT is “OK”. Tthe trick is getting the NLEAF, NLEVELS, FIRSTKEYCARDF, FULLKEYCARDF and CLUSTERRATIOF “correct”, so that EXPLAIN can actually use it!
Of course, if the index is inserted as ENABLE = ‘Y’ and MODE = ‘D’ then you can simply do an INSERT from SELECT and simulate DROPped indexes – which is quite wonderful!

 

Here is another IBM example INSERT from SELECT:

INSERT INTO DSN_VIRTUAL_INDEXES
(TBCREATOR, TBNAME, IXCREATOR, IXNAME, ENABLE, MODE, UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, INDEXTYPE, PGSIZE, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED)                       

SELECT TBCREATOR, TBNAME, CREATOR, NAME, 'Y', 'D', UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, '2', 4, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED
  FROM SYSIBM.SYSINDEXES
WHERE CREATOR = 'SAPR3'
    AND NAME = 'AUSP~Z1';

There was an APAR (PM14223), opened last year for V8 UK60838 and V9 UK60839, which was closed on 2010-09-27, and corrected various problems with regard to empty tables and check constraints.

 

Set of EXPLAIN tables with the correct owner and the DSN_VIRTUAL_INDEXES

To actually get this feature to work, you will also need a set of EXPLAIN tables with the correct owner and the DSN_VIRTUAL_INDEXES with the *same* owner. So if you only create the SYSIBM or the OSC creator, then it is unlikely you will get the benefit of this feature. This is the reason I often hear , It didn’t work at our shop!”

Once everything is in place and you have enabled a virtual index, any BIND, REBIND or EXPLAIN will consider this index. This is good if it’s intentional, but it’s really bad if you just have a record sitting there!

Some golden rules of this table are:

  1. Only leave ENABLEd indexes for the duration of your tests – Never insert Enabled indexes and then go on vacation!
  2. Delete or set “disabled” as soon as you can! (Remember that EXPLAIN will use the virtual indexes, but the actual access paths for BIND/REBIND will not!)
  3. Take care to specify the “correct” values for space and clustering values.

The “correct” values are the hard part…
For Space you can use the space calculation in the DB2 Administration Guide.
For Cardinality you can use estimates or count(*) results.
For Clusterratiof you can use 0, 0.5 , 0.85 , 0.95 and 1.

If you manage all these pieces correctly, the capability is just great. However, it all reminds me a bit on the note Terry Purcell made during a presentation at IDUG NA the 1st week of May: We just make it possible, if you want to use it nicely you may want to talk to your tools vendor. If you have some DB2 experts in your shop, you may even figure it out yourself. – I fully agree. There are tools out there that act as a complete “front end” for this feature. They warn if ENABLEd indexes are found, they allow simple change, copy, insert, update and delete and also  multiple explain runs to see if any of your queries will be affected by a created, changed or dropped index – subsystem wide. Of course I am referring to Bind ImpactExpert and SQL PerformanceExpert from us.  (Well I have to mention our software somewhere in the newsletter!).

 

In a future newsletter I will describe a methodology for redundant index removal that will use this feature – so stay tuned!
I hope that this little bit of data was of interest and/or of use, and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-06: Unused index analysis

 

Building on what we have read over the last months, there is now a methodology available for checking if all of your indexes are actually being used as you think they should be.

 

Check the LASTUSED field in the RTS tables

First, you can simply check the LASTUSED field in the RTS tables, which gives you a good pointer to “used” or “not used”. But beware if the index is Unique and is there purely to stop duplicate inserts, then this style of usage is *not* recorded in the RTS!

You should have full EXPLAIN data for production Static SQL and you should  also trap and run snapshot EXPLAINs of all the Dynamic SQL. Over time, you  can easily build up a usage table to show which indices are being actively used by the SQL running on your machine.

Now all you have to do is work out which indices are:

  1. Marked as not used in the RTS
  2. Not listed in any EXPLAIN table (for dynamic and static)

 

Analyze candidate indexes in depth

This list of candidate indexes then needs to be analyzed in depth. The way I do it, is as follows:

– Pick an index (start with the largest, longest nastiest index(s), where the payback is good or where you have lots of unused indexes on a give table, where the INSERT and DELETE ratio is relatively high).

– Then find the table that it is indexing. Use this as a “driver” for the following EXPLAINS. Use EXPLAIN to get all the access paths that use this table, (this captures non-use of existing indexes), and store into a temporary work table. Use DSN_VIRTUAL_INDEXES to mark our candidate index(s) as DROPPED, and then re-run the EXPLAINS into a new table.
Compare the access paths.

If there is no difference in access path, then you can be 95% sure that the index can indeed be dropped. Now all you must do, is make sure it is either non-unique, or that it is not being used as a duplication stopper!

If all is ok, drop the index and monitor the table usage for a week or two.

 

Repeat the entire procedure every six months or so…

 

What is important to stress here, is that some indexes – even though “not  used” – are actually used by the DB2 optimizer to decide access path. This  is because the SYSCOLDIST is based on the table and not the index, so any  access path choice can use the filter factors and cardinality data in the table. Virtual Indexes does not stop this usage as the data has not been deleted from SYSCOLDIST. This means, that to get that last 5% certainty, you must do the EXPLAINS and the compare *again* after physically dropping the index.

I hope that this little bit of data was of interest and/or of use and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-07: Recursive SQL – Examples (VIEW explode etc)

 

Ahhh What a topic! The most scary SQL that anyone can write must be a recursive cursor…The possibility to do it was introduced in DB2 V8 but I still only see sporadic use and normally it is my own SQLs! The thing is to learn to trust the recursive SQL and once you know how it works it can become a great friend but you must build trust up over time.

 

Here is a typical example use of recursive SQL for the trivial task of getting a list of numbers

WITH TEMP(N) AS
 (SELECT 1
  FROM SYSIBM.SYSDUMMY1
  UNION ALL
  SELECT N+1
  FROM TEMP
  WHERE N < 10)
SELECT N FROM TEMP
;

Running this in SPUFI gives you:

---------+---------+---------+---------+---------+
          N                                               
---------+---------+---------+---------+---------+
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
DSNE610I NUMBER OF ROWS DISPLAYED IS 10
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

 

So how does it work and what controls are there?

Basically it was the creation of CTEs (Common Table Expressions) that allowed the creation of recursive SQL. The WITH xxx AS is the definition of a CTE – Then within (‘s the SQL has two “stages” the first is what I call the seeder – It starts the whole process off and generates the “first row” if you like. Then there is the UNION ALL that is needed to actually join all of the data together. If you try and use just UNION you get a nasty SQL error.

WITH TEMP(N)                
         AS 
(SELECT 1                                    
  FROM SYSIBM.SYSDUMMY1 
UNION                              
SELECT N+1                                                             
FROM TEMP 
WHERE N < 10) 
SELECT N FROM TEMP 
;                                  
---------+---------+---------+---------+---------+---------+---------
DSNT408I SQLCODE= -342, ERROR:THE COMMON TABLE EXPRESSION TEMP MUST 
NOT USE 
SELECT DISTINCT AND MUST USE UNION ALL BECAUSE IT IS RECURSIVE

See? The SQL parser knows what it is doing. So now onto the next stage and that is the recursion statement and *very important* the emergency brake for the whole caboodle.

The recursive part is relatively clear. In this case we wish to select one  number higher than the current number in the TEMP table. This could carry  on forever so there is also an emergency brake that will stop any runaway SQL. In this case I wish to stop after reaching 10 (WHERE N < 10).

 

An explain of this query yields this access path:

---+---------+---------+---------+---------+---------+-------+----+
QNO  PNO  SQ  M  TABLE_NAME   A   CS  INDEX  IO  UJOG  P  CE  TYPE
--+---------+---------+---------+---------+---------+--------+----+
01   01   00  0  TEMP         R   00         N   ----  S     SELECT
02   01   00  0                   00             ----        UNIONA
03   01   00  0  SYSDUMMY1    R   00         N   ----  S     NCOSUB
04   01   00  0  TEMP         R   00         N   ----  S     NCOSUB

Not really a great access path, but it is recursive after all!

Now you can write recursive SQL without using an “emergency brake” but let me show you a sensible usage of recursive SQL first. Imagine you’re interested in extracting DDL for a specific table and you are not sure about the dependencies of the VIEWs or MQTs that are being used (you could always buy our DDL Generator pocket tool of course 😉 <cough> <cough> anyway let us assume you do *not* have it. How do you find all of the “exploding VIEWs/MQTs”? Which ones are dependant on which objects etc. from one level all the way “back up the tree”? How would you write this SQL? You could do it with numerous fetches and probes of the DB2 catalog,  but you could also do it with one single SQL statement (note that this SQL statement has been surgically killed. – If you are interested in getting an actual running version then please contact us.

 

Here’s the outline SQL

WITH VIVLIST                          
       (MAX                          
       ,BCREATOR                      
       ,BNAME                        
       ,BTYPE                        
       ,DCREATOR                      
       ,DNAME                        
       ,DTYPE) AS                    
 (SELECT 1                            
        ,STRIP(A.BCREATOR)            
        ,STRIP(A.BNAME)              
        ,A.BTYPE                      
        ,STRIP(A.DCREATOR)            
        ,STRIP(A.DNAME)              
        ,A.DTYPE                      
  FROM SYSIBM.SYSVIEWDEP A            
  WHERE A.DCREATOR = :WS-CREATOR      
    AND A.DNAME    = :WS-NAME        
  UNION ALL                          
  SELECT B.MAX + 1                    
        ,STRIP(A.BCREATOR)            
        ,STRIP(A.BNAME)              
        ,A.BTYPE                      
        ,STRIP(A.DCREATOR)           
        ,STRIP(A.DNAME)              
        ,A.DTYPE                      
  FROM SYSIBM.SYSVIEWDEP A            
       ,VIVLIST B                      
   WHERE A.DNAME    = B.BNAME          
     AND A.DCREATOR = B.BCREATOR      
     AND B.MAX       < 256            
  )                                    
 .
 .
 .  
;

Now, in this cursor my emergency brake is the B.MAX < 256, which stops this after 256 number of “view in view” definitions have been found. In practice a loop is impossible, as DB2 guarantees that you cannot generate cyclic views and I am pretty sure that no-one has more than 255 dependencies on a single view, or MQT. Anyway, I still like the brake, because when you do not have one (for example if you comment out the “AND B.MAX < 256” line) you get this “worrying” warning at BIND time:

 DSNX105I  -S810 BIND SQL WARNING
            USING MDB2 AUTHORITY
            PLAN=(NOT APPLICABLE)
            DBRM=SQLDDLD
            STATEMENT=2476
            SQLCODE=347
            SQLSTATE=01605
            TOKENS=VIVLIST

For details refer to DB2 for z/OS messages.

A quick look in the DB2 for z/OS codes leads you to
>>
+347 THE RECURSIVE COMMON TABLE EXPRESSION name MAY CONTAIN AN INFINITE LOOP

Explanation: The recursive common table expression called name may not complete. This warning is based on not finding specific syntax as part of the iterative portion of the recursive common table expression. The expected syntax includes:

  • incrementing an INTEGER column in the iterative select list by 1.
  • a predicate in the where clause of the iterative portion of the form “counter_col < constant” or “counter_col < :hostvar”.

The absence of this syntax in the recursive common table expression may result in an infinite loop. The data or some other characteristic of the recursive common table expression may allow the successful completion of the statement anyway.
<<

So now you know why I use that brake!

 

Finally here are two examples

The first copied and slightly changed from the IDUG Code Place for doing maths with recursion:

--- PRINTS THE FACTORIAL OF NUMBERS FROM 1 TO N
WITH TEMP( N , FACT) AS
( SELECT
        1
,CAST (1 AS DECIMAL(31 , 0))
  FROM SYSIBM.SYSDUMMY1
UNION ALL
  SELECT N + 1 , FACT * ( N + 1 )
  FROM TEMP
  WHERE N < 21
)
SELECT *
FROM TEMP
;

Running this query gives you:

---------+---------+---------+---------+-------  
          N                             FACT                        
---------+---------+---------+---------+-------                           
          1                              1.                       
          2                              2. 
          3                              6. 
          4                             24. 
          5                            120.
          6                            720.
          7                           5040.  
          8                          40320. 
          9                         362880.
         10                        3628800.  
         11                       39916800. 
         12                      479001600.                
         13                     6227020800.                   
         14                    87178291200.                 
         15                  1307674368000.     
         16                 20922789888000. 
         17                355687428096000. 
         18               6402373705728000. 
         19             121645100408832000. 
         20            2432902008176640000. 
         21           51090942171709440000. 
DSNE610I NUMBER OF ROWS DISPLAYED IS 21

And now something new from the List Serv (again slightly changed)
One of the posters posted this request:

Given a table A1:

Number          Asset
1               AAAA
1               BBBB
1               CCCC
2               DDDD
2               EEEE

The result should be:

1      AAAA,BBBB,CCCC
2      DDDD,EEEE

“Does that work? I am not sure, if recursive SQL will do it.”

 

A short time later the same poster posted this extremely elegant piece of SQL code

WITH S (LEVEL, NUMBER, ASSET, ASSET_LIST) AS
     (SELECT 1 , A1.NUMBER, MIN(A1.ASSET) AS ASSET,
             MIN(A1.ASSET CONCAT SPACE(3000)) AS ASSET_LIST 
       FROM A1 
       GROUP BY A1.NUMBER 
       UNION ALL 
       SELECT LEVEL + 1 , S.NUMBER, A1.ASSET,
              STRIP(S.ASSET_LIST) CONCAT ',' CONCAT A1.ASSET
       FROM S                 
       INNER JOIN TABLE 
              (SELECT A1.NUMBER, MIN(A1.ASSET) AS ASSET 
                FROM A1           
                WHERE S.NUMBER = A1.NUMBER
                AND S.ASSET  < A1.ASSET
                GROUP BY A1.NUMBER 
               ) AS A1 
             ON S.NUMBER = A1.NUMBER
     )                                      
SELECT S.NUMBER, MAX(S.ASSET_LIST)  AS LIST 
  FROM S 
  GROUP BY S.NUMBER
;

I hope that this little bit of data was of interest and/or of use and as always comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-09: CRUD – How to find which program is doing what to which table

 

Now it might sound a bit rude, but CRUD is one of my favourite SQL queries and I use it a lot!

 

CRUD. Create, Read, Update, and Delete

However, this newsletter topic probably created most fun for me in advance because I have a couple of colleagues -including non-technicians- reading the stuff before it gets published and the feedback on this content is definitely worth a mention. It’s all about this nice acronym that doesn’t leave any open questions for me – it actually stands for Create, Read, Update, and Delete I guess that Insert, Select, Update, and Delete (ISUD) just doesn’t have a ring to it! Anyway it looks like some others never heard of it before, or associate something  totally different… a couple of funny prompts for clarification later led me to do quick search in Wikipedia, which opened up some other associations besides my topic…

  • A sticky substance, especially when dirty and/or encrusted
  • Crud (game), played on a billiard table
  • Chalk river unidentified deposits, corroded products containing radioactive nuclear species
  • Neil Crud, Welsh rock musician, journalist and DJ
  • Crud Puppy, a cartoon character from User Friendly
  • Crud, minced oath for crap, meaning feces
  • Crud, a radio station in Rochdale College
  • All right folks, we’ll skip those for today, cause what I associate with
  • CRUD is “Create, read, update and delete” summarized by Wiki as basic functions of a computer database

All right folks, we’ll skip those for today, cause what I associate with CRUD is “Create, read, update and delete” summarized by Wiki as basic functions of a computer database.

 

Which static SQL program does what to which of my tables?”

Anyway this topic helps you guys with that age old problem of Table usage or even, horror of horrors, the Auditors asking questions…
The problem is “Which static SQL program does what to which of my tables?”
So my first exercise for this is; I wish to see all packages in the TEST collection that have anything whatsoever to do with the SYSIBM.SYSDATABASE table.
Now you could trawl through the SYSxxxxAUTH etc with GRANTEE and GRANTOR and with PUBLIC etc or you could run this (make use of the exclude/include place holder if you like):

SELECT SUBSTR(TCREATOR, 1 , 8) AS CREATOR                  
     , SUBSTR(TTNAME, 1 , 18)   AS NAME                                 
     , SUBSTR(GRANTEE, 1 , 8)  AS PROGRAM 
     , CASE WHEN INSERTAUTH = 'Y' THEN 'C' 
            ELSE '-'  
       END      AS C                                  
     , CASE WHEN SELECTAUTH = 'Y' THEN 'R' 
            ELSE '-'  
       END      AS R  
     , CASE WHEN UPDATEAUTH = 'Y' THEN 'U' 
            ELSE '-'
       END      AS U
     , CASE WHEN DELETEAUTH = 'Y' THEN 'D'                    
            ELSE '-'  
       END      AS D  
     , CASE WHEN COLLID = ' ' THEN '** PLAN **' 
            ELSE COLLID                                 
       END      AS "PLAN/COLLECTION" 
  FROM SYSIBM.SYSTABAUTH 
 WHERE GRANTEETYPE = 'P' 
   AND COLLID      =    'TEST' 
-- AND NOT GRANTEE =    '<exclude>'
-- AND GRANTEE     =    '<include>' 
   AND TCREATOR    =    'SYSIBM' 
   AND TTNAME      =    'SYSDATABASE'                           
--ORDER BY 1 , 2
ORDER BY 3 , 2 
WITH UR  
;

 

The output looks like

---------+---------+---------+---------+---------+---------+--------
CREATOR   NAME                PROGRAM   C  R  U  D  PLAN/COLLECTION     
---------+---------+---------+---------+---------+---------+--------
SYSIBM    SYSDATABASE         M2DBIN09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         M2DBRI09  -  R  -  -  TEST
SYSIBM    SYSDATABASE         M2DBRT09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         M2DBTS09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2DB41  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2DB45  -  R  -  -  TEST  
SYSIBM    SYSDATABASE         MDB2DBPD  -  R  -  -  TEST                  
SYSIBM    SYSDATABASE         MDB2DBTS  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SY8T  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SY9C  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SYSC  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SYST  -  R  -  -  TEST           
SYSIBM    SYSDATABASE         O2RTS030  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDDLD   -  R  -  -  TEST       
SYSIBM    SYSDATABASE         SQLDDLS   -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDV9DB  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDV9TS  -  R  -  -  TEST 
DSNE610I NUMBER OF ROWS DISPLAYED IS 17                                   
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

 

The data shown from this simple example lists 17 programs with select, or lets say read authority on SYSDATABASE. Now, let’s have a look at a scenario where I have a program that updates  SYSTABLESPACESTATS. I use the same SQL statement and change the last line of the where clause to

  AND TTNAME      =    'SYSTABLESPACESTATS'
---------+---------+---------+---------+---------+---------+-------
CREATOR  NAME                PROGRAM   C  R  U  D  PLAN/COLLECTION
---------+---------+---------+---------+---------+---------+------- 
SYSIBM   SYSTABLESPACESTATS  M2DBRT09  -  R  U  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MDB2DB46  -  R  -  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MDB2T003  -  -  U  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MOD#CTLG  C  R  U  D  TEST 
DSNE610I NUMBER OF ROWS DISPLAYED IS 4

A lot less programs and you can see not just read access, but updates are  also done. There is even a MOD#CTLG program that does additionally inserts and updates.

Simple, easy and very straight forward if you need to determine what your static SQL applications are shooting against your DB2 tables.
If you like to do some checks which programs in your shop update the salary table, or who has authority to access sensitive financial data this  may be the easiest way. There are lots of alternatives that you can think  of yourselves! Very handy for housekeeping, documentation, or checking out which programs will be affected by a table change etc.

 

I hope that this little bit of data was of interest and/or of use and as always comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-10: PLAN to PACKAGE Migration – Do you have any “bad guys”

 

Now we all know that DB2 10 is here – You do know that don’t you?? 🙂

Do away with DBRMs being bound directly to a PLAN

Anyway one of the points of DB2 10 is to finally do away with DBRMs being bound directly to a PLAN. Now you absolutely must go to PACKAGES and COLLECTIONS. I have seen no real need of having PLAN-based programs for years, but some people/sites/software still seem to insist upon having these things so I know that some shops are now facing the following questions:

 

  1. Let DB2 10 “do the magic” – In other words let the migration create all the collections for you?
  2. Bite the Bullet – Do it now in DB2 V8 or DB2 9; thus allowing *you* to decide on naming conventions, etc? Of course I would be failing in my marketing skills if I forgot to plug our DBRM Reconstruct feature of Bind ImpactExpert that can also do this for you…

 

Guess what I would choose?? I have two reasons for choosing answer number 2 and they are:

  1. I like to control my own naming standards.
  2. I need to create these PACKAGES *before* the Migration.

 

Use PLAN MANAGEMENT during a migration

Why (2) ? You may well ask. Quite simple – If you rebind/bind now with current statistics, and you happen to be on DB2 9, you can use PLAN MANAGEMENT during migration to keep your (new) current access path “locked down” so in case of regression you can fall back to a good access path. Additionally, if you are still on DB2 V8, then a rebind before skip-level migration is basically a must so you have a chance to find any problems before landing at the door of DB2 10 with unknown, weird, or downright strange access paths, and you have simply no idea where they came from. For example, could they have come in V8, or is it a new special rule access path from DB2 9, or even perhaps an artefact from bad statistics somewhere down the chain?

At this point, I should also mention that you need good consistent statistics for *any* migration. Make sure that your SYSCOLDIST and SYSKEYTGTDIST contain good trustworthy data! (See my earlier newsletter about our Statistics HealthCheck Freeware for more details there.)

 

What is the worst thing that could happen?

So back to this newsletter – What is the worst thing that could happen? Well, what if you happen to have two PLANs that both have a DBRM called ROYBOY, but both ROYBOY’s have different pre-compile timestamps? This is nasty – really really nasty – and should of course “never happen”, but it  does happen and far too often for my taste! So now to the meat and veg of the newsletter – Here is a nice little query that will show which DBRMs you have on your system that have the same name, but different pre-compile timestamps. These all need to be “sorted out” (That’s a euphemism for FREE of course 🙂   ) before beginning the journey to DB2 10.

 

SELECT SUBSTR(B.NAME , 1 , 8) AS DBRM_NAME 
     , COUNT(*)               AS NUMBER_DIFF_PRECOMPTS 
  FROM                                               
      (SELECT NAME 
            , PRECOMPTS 
         FROM SYSIBM.SYSDBRM 
        GROUP BY NAME , PRECOMPTS) AS B 
 GROUP BY B.NAME
HAVING COUNT(*) > 1

 

The output looks like this:

---------+---------+---------+---
DBRM_NAME  NUMBER_DIFF_PRECOMPTS                
---------+---------+---------+---  
PLMEM1                         2 
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Using this output you can select the needed data, or you could imbed this SELECT in another SELECT to get the needed data like this:

SELECT SUBSTR(A.NAME , 1 , 8 )      AS DBRM_NAME 
      ,SUBSTR(A.PLNAME , 1 , 8 )    AS PLAN_NAME 
      ,SUBSTR(A.PLCREATOR , 1 , 8 ) AS CREATOR  
      ,A.PRECOMPTS                  
  FROM SYSIBM.SYSDBRM A 
     ,(SELECT SUBSTR(B.NAME , 1 , 8) AS DBRM_NAME 
            , COUNT(*)               AS NUMBER_DIFF_PRECOMPTS 
         FROM   
            (SELECT NAME 
                  , PRECOMPTS
               FROM SYSIBM.SYSDBRM 
              GROUP BY NAME , PRECOMPTS) AS B 
        GROUP BY B.NAME
       HAVING COUNT(*) > 1
      ) AS C  
 WHERE C.DBRM_NAME = A.NAME                 
 ORDER BY 1 , 2                                            
;

The output then looks like this:

---------+---------+---------+---------+---------+---------+-             
DBRM_NAME  PLAN_NAME  CREATOR   PRECOMPTS      
---------+---------+---------+---------+---------+---------+-             
PLMEM1     TESTP4     MDU       2007-11-26-19.48.41.982926      
PLMEM1     TESTP5     MDU       2009-10-26-10.22.12.362131  
PLMEM1     TESTP6     MDU       2007-11-26-19.48.41.982926    
DSNE610I NUMBER OF ROWS DISPLAYED IS 3

This now gives the “overview” of what is possibly wrong. In this case, you can see the different timestamps and could deduce what action(s) must be taken.

I hope that this little bit of data was of interest and/or of use and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2012-01: Native SQL procedures

 

These were introduced years and years ago in DB2 9 – Remember that version?? Anyway the basic idea was to streamline and simplify the DB2 side of things as well as the DBA work ( Creation, control and performance.) The central change was the elimination of a need to initialize a Language Environment where the compiled program will execute. The creation of this environment and the need to pass all of the SQL calls back-and-forth added a large amount of overhead and hindered the mass acceptance of Stored Procedures (on the mainframe). As an added bonus it also gave us the possible usage of zIIP – Only from DRDA at the moment but any usage is pretty good!
Now this newsletter is *not* going to get into the coding aspects as there are tons of examples and red books out there but more into the technical realization (How do I EXPLAIN the darn things) and the “Replace all , none or some” methodology.

 

How to create a Native SQL procedure?

To create a Native SQL Procedure you must *not* use the FENCED or EXTERNAL NAME keywords in the PROCEDURE creation body. An extra tip for the SPUFI users amongst us is to switch the SPUFI SQL format default to be SQLPL it makes it a lot easier to build and test (Unless you use DataStudio to do it all for you of course!). In case you are wondering the default panel looks like this

                     CURRENT SPUFI DEFAULTS             SSID: S91A      
 ===>             
             
1  SQL TERMINATOR .. ===> ;    (SQL Statement Terminator)              
2  ISOLATION LEVEL   ===> CS   (RR=Repeatable Read, CS=Cursor
                                Stability,UR=Uncommitted Read)         
3  MAX SELECT LINES  ===> 9999 (Max lines to be return from SELECT)
4  ALLOW SQL WARNINGS===> YES  (Continue fetching after sqlwarning)
5  CHANGE PLAN NAMES ===> NO   (Change the plan names used by SPUFI)  
6  SQL FORMAT....... ===> SQL  (SQL, SQLCOMNT, or SQLPL)

 

This then “enables” the line number of the SQL to be used as QUERYO automatically thus enabling you to explain and *find* the results!!! A very good idea these days!

 

The question now is: Have you migrated all your “old” C programs across yet? If not – Why not?

Now we all know the saying “if it aint broke don’t fix it!” however in this case the time and effort involved in choosing the stored procedures that get migrated is well worth it. First you should get yourself some knowledge from the documentation and the web all about building, versioning (a very cool feature of Native SQL procedures by the way!) and DEPLOYing these things (An even cooler feature as the DEPLOY option stops the BIND on production and therefore the access path currently in use in test is simply “copied over” to the remote production system. Of course you must have access to the remote production system from test to do this and that is normally a no no – sadly!).
As always it is better to start out with a couple of “the usual suspects” and once these have been migrated across and monitored *and* the savings calculated – Then you should roll out a general plan for all of the current “heavy hitters”

For further reading I can recommend the red book “DB2 9 Stored Procedures: Through the CALL and Beyond“ sg247604 which was last updated in Feb 2011 so it is still “fresh”!

There are also two compact and concise technical articles. First is a Blog entry from Peggy Zagelow and then a very nice technote. Last, but not least, have alook at this 2 parts article from Linda Claussen.

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2012-02: HASH Index – What is it good for?

Oh dear! I seem to have made a HASH of it!

Many thanks to Jeff for requesting information about HASH indexing and thus causing the terrible pun in the first line!

 

A brand new access path for DB2 10

Hash access is one of the new goodies that arrived in DB2 10 NFM and is a brand new access path for DB2. (Anyone remember HDAM?) First the good news – When Hash access is used, it is pretty fast! If you are lucky, one I/O is all that is needed unless you get your hash space definition hopelessly wrong. In which case, it gets messy!

 

HASH Access. What is it?

Simply put, hash access takes your given key (max 255 bytes) and computes a 64 bit number. DB2 computes “the relative page number” using the MOD function with 2 modulo arguments: “this 64 bit number” and “a prime number prime number derived from the size of the table”.   Next, DB2 computes the hash anchor point using the MOD function with 2 modulo arguments:  “this 64 bit number” and “another prime number between 17 and 53”. Now the “relative page number” and the “hash anchor point” act like a cross-reference directly to the row in question. Of course, life is a bit more complicated than that, and you can easily get collisions where two keys end up with the same numbers.  If this happens, the second one gets written to the ominous sounding “hash over flow area” and its associated index (Remember for later that this is also a sparse index!). By the way, the optimal setting is to have between 0% and 8% of rows in the overflow area.

 

Any physical limitations?

Yes, there are a whole bunch! The table must be in a UTS, and it must be in RRF format. The hash key is *not* updateable (déjà vu here with partitioned key changes from aeons ago!). DROP a hash, and your table space is put into REORGP, which of course needs a SHRLEVEL NONE REORG to repair it and that is not a recoverable scenario either. No CLUSTER indexes are allowed of course. Hash index *must* be implicitly created. No LOB or XML columns and it cannot be an MQT. <phew!>

Which all reminds me of a certain Monty Python sketch: “But apart from that, what have the Romans done for us?”. Just replace Romans with Hash Access. 🙂

So that is the list of physical limitations – What about logical limitations?

 

What about logical limitations?

No parallel access or star/hybrid joins are possible. Hash access is not used for sensitive dynamic scrollable cursor with multi-element in-list. RUNSTATS is a waste of time as it is a sparse index and so, by definition, only has a few percent of the table rows in it. My advice is to simply exclude hash indexes from your DB2 database maintenance during RUNSTATS processing. However, remember you must still monitor these indexes for REORG and for re-sizing as well as hash usage.

For which data does it actually makes sense to change to be Hash accessible? Straight off the bat are: Single Row unique equals and IN predicates. No range scans. Access is random. The row that you are searching for is normally found. The size of the table is static or well known. The columns all have the same size (no large variants please!), which should ideally be between 30 and 2000 bytes. Finally it would be perfect if 20 or more rows fit on a page.

Good candidates should have four or more index levels in order to save tons of I/O.

Bad candidates have high insert rates or updates that change row length.
So you can see that you have to do a fair bit of work to find the candidates!

Once you have decided to change a table to be hash-based, you can see from EXPLAIN whether or not a given query will attempt to use the hash method or not. Access types of H, HN, and MH will confirm usage – You can use the RTS to see when and how often the last HASH access actually occurred (HASHLASTUSED and REORGHASHACCESS columns).

 

If you decide to stop using the hash organization and drop it, remember the REORGP.

LOADing is still a bit of trauma of course. The current best method is:
Create table, Load Data, Alter to Hash, and REORG.

Mass Inserts are also pretty bad.

I did some testing here loading up 90,000 rows with an eight byte (repeatable) random key and found the elapsed time varies greatly with the size of the HASHSPACE and with the bufferpool definition of the table space.  (Remember that basic BP tuning is always a good idea!) Here is what I found:

1 MB Hash Space leads to 73,183 overflow records 36 secs and 251 hash pages
Then I went high with the HASH SPACE:
256 MB Hash Space leads to zero overflow records 96 secs and 65,327 hash pages
Then I stepped down (in binary of course!):
128 MB Hash Space leads to zero overflow records 82 secs and 32,573 hash pages
64 MB Hash Space leads to zero overflow records 44 secs and 16,381 hash pages
32 MB Hash Space leads to zero overflow records 23 secs and 8,191 hash pages
16 MB Hash Space leads to zero overflow records 22 secs and 4,093 hash pages
8 MB Hash Space leads to one overflow record 22 secs and 2,039 hash pages
4 MB Hash Space leads to 21,620 overflow records 36 secs and 1,021 hash pages
2 MB Hash Space leads to 55,897 overflow records 38 secs and 509 hash pages

And then finally the optimum size for my little test:
9 MB Hash Space leads to zero overflow records 19 secs and 2,297 hash pages

Then I decided to test out the AUTOESTSPACE YES function to see what changes it made.
The following is a list of results from my test data:
256MB Hash Space LOADed 90,000 using 32,719 Hash pages with 0 overflows
REORG of above changed the hash pages down to 653 with 3,762 overflow records
1MB Hash Space with no data REORGed to 31 pages
1MB Hash Space LOADed 90,000 using 127 Hash pages with 72,728 overflows
REORG of above changed the hash pages up to 653 with 3,762 overflow records
8MB Hash space LOADed 90,000 using 1,021 Hash pages with 0 overflows
REORG of above changed the hash pages down to 653 with 3,762 overflow records

So you can see that REORG aims for about 5% of rows, which is OK.
Finally, also check out “Chapter 20. Organizing tables by hash for fast access to individual rows” in the Managing Performance book for some great details about how to squeeze the last bytes out of your hash space!

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect