2018-02 Db2 Catalog Statistics revisited

Db2 Optimizer & access path selection for Db2 11 & Db2 12 :

Db2 Catalog dataProblematic default values | Correlations in the Db2 Catalog 

It has been six years since the last update so I thought, after Terry Purcell’s excellent presentation in January 2018, it would be a good point in time to go over and rake the coals again—especially as a couple of things have changed for Db2 12!


Terry Purcell – Db2 12 for z/OS Optimizer and RUNSTATS improvements
Webcast replay          Abstract

Are you a RUNSTATS Master?

Every now and again, I hold a little presentation called “Are you a RUNSTATS Master?” Actually these days it’s called “Db2 z/OS Lies, Damn Lies, and Statistics…” where I describe in detail, what the Db2 Optimizer uses for access path selection in relation to the Db2 Catalog data.

Surprised? You will be!

Personally, I am always surprised at how often people say “just that data?” or “is that it?” (the various other reasons for access path selection like CP speed, RID Pool size, Sort Pool size, Max data caching size, and, of course, the 80 bufferpools are also mentioned, but these have nothing to do with RUNSTATS).

So generally the answer is “Yes.” However, the permutations and combinations make the devil in the detail – The Db2 Optimizer’s algorithms are top secret, but the input data it uses is fully described in the documentation.

Just the facts ma’am

What I want to do, is show :

  • the Db2 Catalog data that is used
  • the default values that can cause surprising things to happen
  • the problem of correlations in the Db2 Catalog

Which data are used by the Db2 Optimizer and which are updated by RUNSTATS?

Here is a complete list of the eleven tables used by the Db2 Optimizer:

  1. SYSIBM.SYSCOLDIST
  2. SYSIBM.SYSCOLSTATS *
  3. SYSIBM.SYSCOLUMNS
  4. SYSIBM.SYSINDEXES
  5. SYSIBM.SYSINDEXPART
  6. SYSIBM.SYSKEYTARGETS (same as SYSCOLUMNS)
  7. SYSIBM.SYSKEYTGTDIST (same as SYSCOLDIST)
  8. SYSIBM.SYSROUTINES
  9. SYSIBM.SYSTABLES
  10. SYSIBM.SYSTABLESPACE
  11. SYSIBM.SYSTABSTATS

* degree of parallelism only and, after APAR PK62804, also „sometimes“ used to bound filter factor estimates…

By the Columns

Now we can also list out all of the columns (obviously not including the key columns) which are used by the Db2 Optimizer:


SYSCOLDIST
CARDF, COLGROUPCOLNO, COLVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, QUANTILENO, STATSTIME


SYSCOLSTATS
COLCARD, HIGHKEY, LOWKEY


SYSCOLUMNS
COLCARDF, HIGH2KEY, LOW2KEY


SYSINDEXES
CLUSTERING*, CLUSTERRATIO, CLUSTERRATIOF, DATAREPEATFACTORF, FIRSTKEYCARDF, FULLKEYCARDF, NLEAF, NLEVELS


SYSINDEXPART
LIMITKEY*


SYSKEYTARGETS
CARDF, HIGH2KEY, LOW2KEY, STATS_FORMAT


SYSKEYTGTDIST
CARDF, KEYGROUPKEYNO, KEYVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMKEYS, QUANTILENO, STATSTIME


SYSROUTINES
CARDINALITY*, INITIAL_INSTS*, INITIAL_IOS*, INSTS_PER_INVOC*, IOS_PER_INVOC*


SYSTABLES
CARDF, EDPROC*, NPAGES, NPAGESF, PCTROWCOMP


SYSTABLESPACE
NACTIVE, NACTIVEF


SYSTABSTATS
CARD, CARDF, NPAGES


Notes: * Columns are not updated by RUNSTATS and _ Columns are not updatable at all. The column STATSTIME is used only if there are duplicates in the SYSCOLDIST table, and then the Db2 Optimizer will use the “newer” data that was probably inserted by a User.

Know your defaults

Which default column values trigger the Db2 Optimizer to use its own internal default values?


SYSCOLUMNS


If COLCARDF                       = -1 then use 25


SYSINDEXES


If CLUSTERRATIOF            = 0 then use CLUSTERRATIO


If CLUSTERRATIO              = 0 then use 0.95 if the index is CLUSTERing = ‘Y’ otherwise 0.00


DATAREPEATFACTORF    = -1 then is ignored


If FIRSTKEYCARDF            = -1 then use 25


If FULLKEYCARDF             = -1 then use 25


If NLEAF                              = -1 then use 33 (Which is SYSTABLES.CARDF / 300)


If NLEVELS                         = -1 then use 2


SYSROUTINES


If CARDINALITY                  = -1 then use 10,000  


If INITIAL_INSTS                 = -1 then use 40,000


If INITIAL_IOS                     = -1 then use 0


If INSTS_PER_INVOC        = -1 then use 4,000


If IOS_PER_INVOC            = -1 then use 0


If IOS_PER_INVOC            = -1 then use 0


SYSTABLES


If CARDF                             = -1 then use 10,000


If NPAGESF                      <= 0 then use NPAGES


If NPAGES                          = -1 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))

Here you must be very careful if using NPGTHRSH ZPARM as 501 is more than the default value in most shops. This is one of the little changes in Db2 12 where the value -1 is treated as -1 for the NPGTHRSH check.


SYSTABLESPACE


If NACTIVEF                     = 0 then use NACTIVE


If NACTIVE                       = 0 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))


SYSTABSTATS


If CARDF                         = -1 then use 10,000


If NPAGES                       = -1 then use 501 (Which is CEILING (1 + SYSTABSTATS.CARDF / 20))


So now you can see that non-floating point “old” data, may still be used today and this may cause access path headaches!

Never ever say never

Now to top it all, the data in the SYSCOLDIST and SYSKEYTGTDIST never gets simply “deleted”.

Well, actually, in Db2 12 you can now do a RUNSTATS xxx.yyy RESET ACCESSPATH to delete all SYSCOLDIST and SYSKEYTGTDIST data and set all other relevant columns to their respective defaults, but you must time this RUNSTATS very wisely! If you run it and then forget to do a normal full RUNSTATS…

Oldie but a goldie

Once the data are inserted, they stay there, until they are overwritten by new data, a RUNSTATS RESET, or the object is dropped. This all leads to some very old data in these two tables that can and do cause the Db2 Optimizer a ton of grief! One of the first things I do is to simply select the MIN(STATSTIME) from these tables just to see how old the data really is. Do it yourself and be surprised! I have seen sites with eight-year old data in the SYSCOLDIST and that cannot be good!

Correlate the world

Now onto correlations… There are lots of little tricks that DBAs use to “massage” access path choice. One of these, is to just set NLEVELS to 15 for a given index. Then lots of queries simply refuse to touch it as it would appear to be HUGE. Now, just simply updating columns can cause the Db2 Optimizer, in the best case, to ignore the updates or, perhaps, makes things even worse! So here is a list of the correlations (In other words, if you change xxx remember to change yyy and zzz as well):

Relationships exist among certain columns of certain tables:

  •       Columns within SYSCOLUMNS
  •       Columns in the tables SYSCOLUMNS and SYSINDEXES
  •       Columns in the tables SYSCOLUMNS and SYSCOLDIST
  •       Columns in the tables SYSCOLUMNS, SYSCOLDIST, and SYSINDEXES

 If you plan to update some values, keep in mind the following correlations:

  •  COLCARDF and FIRSTKEYCARDF/FULLKEYCARDF (They must be equal for the 1st column and full, if a single column index)
  •  COLCARDF, LOW2KEY and HIGH2KEY. (For non-default COLCARDF LOW2KEY and HIGH2KEY key must be filled with data) and if the COLCARDF is 1 or 2 Db2 uses LOW2KEY and HIGH2KEY as domain statistics to generate frequencies.
  • CARDF in SYSCOLDIST.  CARDF is related to COLCARDF and FIRSTKEYCARDF and FULLKEYCARDF. It must be at a minimum
  • A value between FIRSTKEYCARDF and FULLKEYCARDF if the index contains the same set of columns
  • A value between MAX(colcardf of each col) and the product of all the columns COLCARDFs in the group
  • CARDF in SYSTABLES. CARDF must be equal or larger than any other cardinalities, such as COLCARDF, FIRSTKEYCARDF, FULLKEYCARDF, and CARDF in SYSCOLDIST
  • FREQUENCYF and COLCARDF or CARDF. The number of frequencies collected must be less than or equal to COLCARDF for the column or CARDF for the column group
  • FREQUENCYF. The sum of frequencies collected for a column or column group must be less than or equal to 1

New in Db2 11

In Db2 11, the table SYSSTATFEEDBACK was introduced giving us the first chance to see what the optimizer thinks is missing. This is truly awesome, as then we can tailor our RUNSTATS to generate exactly what the optimizer needs to really validate and generate a good, stable access path. Of course, you should be a little bit careful with this data as too much of a good thing can be bad for you!

New in Db2 12

(Not just the lowercase b!)

In Db2 12, the SYSSTATFEEDBACK was made even more interesting by now externalizing the required RUNSTATS options *directly* into the already existing RUNSTATS profile or, indeed, actually creating a RUNSTATS profile for you.

I think that is really dangerous, as then you could easily flood your system with bogus stats for end user QMF/SPUFI queries that were run “by accident,” or so called “boss queries” where someone with *no* idea of SQL clicks together a highly complex and badly written SQL before letting it run for a weekend. Naturally the SQL gets rewritten by a helpful ever present DBA, but the statistics recommendations have now landed in the profile and will be updated and kept from this point on.

My personal recommendation is to switch off this feature as it is sadly *on* by default!

Here are the ZPARMs of interest

ZPARM STATFDBK_SCOPE set to ALL by default.
ZPARM STATFDBK_PROFILE set to YES by default.
Plus, in table SYSIBM.SYSTABLES column STATS_FEEDBACK is set to Y by default.

Out-of-the box it starts automatically creating (for TYPE=’C‘ with NUMCOLS > 1 and TYPE=’F‘ or ‚H‘) profiles and updating existing profiles…Here you must manually check the size of your profiles every now and again just to make sure everything is ok!

One other new thing in Db2 12, is that XML columns can get statistics now to help XMLEXISTS get a better access path.

and finally

Do not forget that our little Freeware tool StatisticsHealthCheck will find all bad correlations, old data and badly updated data for you and it is FREE!

So I hope this little round-up of Db2 Catalog Statistics data was interesting, and, as usual, if you have any comments or questions, then please, feel free to mail me!

TTFN

Roy Boxwell

 

 

2017-11 Db2 APAR list: An APAR a day keeps the bugs at bay

Db2 12 APAR – time saving list

With all the talk about “agile” going on, and referring to one of my older 2017-09 newsletters* on APARs, I think it is time to tell you all about another little service that our company offers.

(completely free, simple and no marketing spam)


* Db2 12 Agile & APAR previous newsletter:
2017-09: Db2 12 SQL Access path: Death by APAR :  How many APARs really can affect access paths?

APAR Database

The APAR database can be accessed by anyone with an IBM Userid and you can merrily search to see if the problem that you have hit is already found and fixed, or a fix is in the works.

The problem is: What about the bugs that you do not *know* you have hit?

Get someone else to do the donkey work

The answer is to get someone else to do all the research for you, and deliver the answer by e-mail every month for all current releases of Db2 going back two years.

Who is that donkey? You will never guess…  😉

Three is the key

There are three Excel spread sheets here.


  1. SQL Performance
    The first is RTS. This lists any and all APARs to do with the Real-Time Statistics tables. If you rely on these tables to decide when to run REORG, COPY and RUNSTATS, then you want to make sure that they are being correctly updated, don’t you?


  2. SQL Access Path
    The second is RUNSTATS. RUNSTATS is a critical utility program and contains bugs like any other non-trivial program. I count RUNSTATS bugs as *always* a personal HIPER. I rely on statistics and so does the Db2 Optimizer. If there is bad data here, then your access paths have no chance!


  3. Performance PTFs
    Finally, and the biggest list, is those APARs that have anything to do with SQL Performance and SQL Access paths (If not already in the RUNSTATS list of course!) Performance PTFs are pretty important!

Going hyper over HIPER

All the APARs have their related PTF. A handy little HIPER column tells you whether or not this is a really important fix. As I mentioned, I treat the RUNSTATS ones as personal HIPERS. Also added is a PE indicator when a PTF goes bad on you (PTF in Error) so you can see if you introduced a problem by correcting another.

One Excel Example: The SQL Performance spread sheet

Db2 12 APAR list free for Db2 z/OS: SQL Performance (RTS) - Performance PTFs - SQL Access Path (RUNSTATS) -

This is from the SQL Performance spread sheet and you can see how it looks.

The same table in HTML:

APARCLOSEDSTATUSDb2 10 Db2 11Db2 12HYPERDescription
PI85305
2017-11-01
Closed
N/A
UI51606
UI51601
INEFFICIENT INDEX CHOSEN WHEN INDEX CAN DO INDEX SKIPPING…
 PI85418
 2017-19-25
Modified
UI50098
UI50099
UI50582
PREPARE TAKES LONG TIME AND HIGH CPU IF THE QUERY CONTAINS MA…
 PI85463
New & Closed
 N/A
 N/A
UI51342
TABLESPACE SCAN INSTEAD OF INDEX ACCESS – WITH INDEX HAVIN…

 …

 …

 …

 …

 …

 …

 …

 …

Interested in making your life a little bit easier and safer?

If you are interested in Db2 APARs and you have *no* time to go and scrabble around the internet trying to find out which APARs are needed, worthwhile, or dangerous, then

just email our technical support techsupport@seg.de and ask to be added to the APAR Mailing list.

It is completely free, you will not get marketing spam and it really is as simple as that!

 

Get AGILE now!

With these handy lists, you can quickly and easily review the state of your Db2, and react quickly and in a timely manner when you have to.

 

As usual, if you have any comments or queries please feel free to drop me a line!

TTFN

 

Roy Boxwell