2012-11: EXPLAIN table maintenance and clean-up

Remember the good old days when there was a <userid>.PLAN_TABLE with all your access path data in it?
When the package was dropped or changed by some program maintenance you could simply do a DELETE WHERE NOT EXISTS style SQL to keep your data up to date and pristine?

Of course those days have *long* gone…
Nowadays everyone on the block has got DataStudio installed and it installs a whole bunch of “new” hidden tables to enable Query Tuning. This is OK of course, but do you actually take care of this data? If it is the production EXPLAIN tables – Are you image copying, runstating and reorging when you should? Do you have all the indexes you need? Are you deleting too much or not enough data? Can you, in fact, delete anything these days without the risk that you will delete an access path that may “come back from the dead”?

First, a quick review of which tables you may well have in existence at the moment:

 

DB2 V8

PLAN_TABLEThe Good old original, now with 58 columns
DSN_STATEMNT_TABLEAlso available as a published API with 12 columns used for Costs
DSN_FUNCTION_TABLEAs above with 15 columns for checking which function is called
DSN_DETCOST_TABLEThe Hidden tables first officially externalized by DataStudio
DSN_FILTER_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERY_TABLENot populated by EXPLAIN(YES) for BIND/REBIND
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

Now, most of the descriptive text for the new ones is “IBM internal use only” but some of them are pretty cool! Please remember that all this data is asis, and will probably *not* help you in your day to day tuning one little bit. That being said, I do use the DSN_PREDICAT_TABLE quite a lot.
The first “problem” is: the DSN_QUERY_TABLE because it contains a CLOB(2M) column, which means that it can grow very very quickly – well, not the DSN_QUERY_TABLE itself of course, rather its auxiliary table that holds the LOB data.

 

DB2 9

PLAN_TABLENow with 59 columns
DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLENew for query rewrite & accelerator usage and with two LOBs
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STATEMENT_CACHE_TABLENew with a LOB column
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

The DSN_STATEMENT_CACHE_TABLE must be separately maintained as it has nothing to do with the PLAN_TABLE of course!

 

DB2 10

PLAN_TABLENow with 64 columns
DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_COLDIST_TABLENew and contains the SYSCOLDIST data
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_KEYTGTDIST_TABLENew and contains the SYSKEYTGTDIST data
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLE
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STATEMENT_CACHE_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

 

Now it must be apparent that the amount and size of this data is getting out of hand. If someone, by accident, creates all of these tables in production, then every BIND or REBIND with EXPLAIN(YES) will be doing a vast amount of I/O possibly up to the point, that you run out of space in one or more of the tablespaces (Especially the LOB ones really hurt!). I actually filled two packs with data…cost me a pizza…

In DB2 10, IBM aligned all of the EXPLAIN tables to have a “common” key: QUERYNO, APPLNAME, PROGNAME, COLLID, GROUP_MEMBER, SECTNOI, VERSION and EXPLAIN_TIME. Before this, some or nearly all of these columns were not there (DSN_PGRANGE_TABLE and DSN_QUERY_TABLE being the worst with only QUERYNO, GROUP_MEMBER, and EXPLAIN_TIME as a “key”)

This leads to some minor problems in determining which rows can be deleted, but nothing disastrous!

To clean-up in DB2 10, the simplest and best is a DELETE WHERE NOT EXISTS from the child tables to the PLAN_TABLE and as these are nowadays all the same; it is very simple:

 

DELETE
FROM <userid>.DSN_<name>_TABLE A 
WHERE NOT EXISTS (SELECT 1
                  FROM <userid>.PLAN_TABLE B
                  WHERE A.QUERYNO = B.QUERYNO
                    AND A.APPLNAME = B.APPLNAME
                    AND A.PROGNAME = B.PROGNAME
                    AND A.COLLID = B.COLLID
                    AND A.GROUP_MEMBER = B.GROUP_MEMBER
                    AND A.SECTNOI = B.SECTNOI
                    AND A.VERSION = B.VERSION
                    AND A.EXPLAIN_TIME = B.EXPLAIN_TIME)
;

And you are done once you have done this on all 16 tables…

DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_COLDIST_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_KEYTGTDIST_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLE
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

Of course the first real work is to find out which data you no longer need in the <userid>.PLAN_TABLE; and here the new catalog table SYSIBM.SYSPACKCOPY can help you – once you have bound everything in NFM; of course!

  
DELETE
FROM <u serid>.PLAN_TABLE A
WHERE NOT EXISTS (SELECT 1
FROM (SELECT COLLID
, NAME
, BINDTIME
FROM SYSIBM.SYSPACKCOPY
UNION ALL
SELECT COLLID
, NAME
, BINDTIME
FROM SYSIBM.SYSPACKAGE
) B
WHERE A.COLLID = B.COLLID
  AND A.PROGNAME = B.NAME
  AND A.BIND_TIME = B.BINDTIME)
 ;

This finds all of the currently “in use” (or possibly in use) collections and packages and then simply deletes from the PLAN_TABLE any that are not there. Having run this, you can then run the 16 other tidy up queries before finally doing a REORG with inline RUNSTATS and COPY, as this data is critical for your business! Now as it is critical… where have you got it stored? Does it also have the same stringent standards as other production data? One table per tablespace with a correct SEGSIZE, Bufferpool etc. Now is the time to review where your EXPLAIN data actually lives and take corrective action – it will speed up BIND and REBIND as deadlocks will reduce, and finally add all of the tablespaces to your DB2 Database Maintenance system! Just doing INSERTs is all well and good but when you try and SELECT the data, the performance can be terrible! For this the RealTime DBAExpert (RTDX) is your friend.

At this point it is also worth remembering that indexes also can be a great help! You should have two indexes on the PLAN_TABLE on BIND_TIME, COLLID, NAME and also BIND_TIME, QUERYNO, PROGNAME. All of the other EXPLAIN tables need at least EXPLAIN_TIME, QUERYNO and PROGNAME. You can happily add or alter these, of course, as it might even make sense to have indexes on QUERYNO, APPLNAME, PROGNAME, COLLID, GROUP_MEMBER, SECTNOI, VERSION, and EXPLAIN_TIME. Feel free to mix’n’ match for what you are doing with your tables! For example if using HINTs, it is recommended to have QUERYNO, APPLNAME, PROGNAME, VERSION, COLLID, OPTHINT or the performance will be pretty bad.
Bottom line is – Just make sure that you *only* create the EXPLAIN tables you actually *need*
and you *only* create the indexes you actually *need*

 

Happy EXPLAINing! As usual  comments and queries are always welcome!

TTFN,
Roy Boxwell
Senior Architect