Rotten Results from RUNSTATS Require Rescue

Do you know the basic rules to ensure access path stability when using RUNSTATS?

Time for another of my “I noticed something strange at a customer site recently” newsletters. Enjoy!

 

RUNSTATS are good aren’t they?

At this particular site, the RUNSTATS methodology of RUNSTATS was, shall we say, “sub-optimal.” They use an ancient system to decide when to RUNSTATS, and they do tablespace’s and index’s *never* at the same time. Just to complicate matters even more, they never use inline RUNSTATS because “if the REORG abends, the statistics in the DB2 catalog are dead”. Now you are all probably well aware of the scale of the disaster at this site?

 

Daily fire fighting

Nearly every day, some access path somewhere goes horribly wrong… the under- manned and over-worked DBA group are tasked to find and fix ASAP. Cures range from a quick INDEX create or change, or perhaps even a really needed RUNSTATS or REORG.

 

Why do the Access Paths go “wrong”?

The real goal is to stop firefighting and to investigate the root cause. Why do so many access paths go wrong on such a regular basis? The answer is the systemic horribleness of RUNSTATS collection. Dynamic SQL is, obviously, very very sensitive to RUNSTATS. For one thing, the statements are kicked out of the cache! The very next time they come back, the DB2 Optimizer redrives the cost calculations and “Hey Presto!” you have a bad access path. Terry Purcell and Pat Bossmann have often said that about 90% of DB2 performance problems stem from bad RUNSTATS. The old adage “garbage in – garbage out” is still true!

 

Timing is everything

The timing of the RUNSTATS is critical for stable access paths.

Basic rules are:

 1 Only do a RUNSTATS if you really really need to!
a. RUNSTATS are not cheap!
b. The Dynamic Statement cache gets wiped
c. Locks on the Catalog can occur
2Avoid doing RUNSTATS even if RTS says to run one!
a. Lots of people use the incorrect counters to trigger a RUNSTATS. Use the correct ones for the correct Object type
b. Never RUNSTATS LOB spaces – completely pointless work!
c. Even if a MASSDELETE has occurred do you really want to “reset” the DB2 catalog statistics?
d. VOLATILE tables must be handled with *extreme* care!
3Choose your RUNSTATS parameters wisely!
a. Doing a blind “RUNSTATS the world” is just as bad as running an empty RUNSTATS!
b. HISTOGRAM should be used with caution
c. More than a hundred COLGROUPs should start alarm bells ringing

Quite a list here, and it really only shows some “Rules of Thumb”. I’ll bet you all have you own?

Is there a way back from the abyss?

But what happens if you have 1000’s of partitions with terabytes of data and the RUNSTATS was, shall we say, ill-advised or badly timed? Can you go back in time? Hands up those who wants to do a PiT recovery on the production catalog! No takers???

Yes! There is a way back from the abyss

I’ll bet you are all well ahead of me here, but the way to do this is pretty straightforward. You simply acquire our latest tool, RUNSTATS Rescue, to handle it all for you. Or, you could try and reset the data in the DB2 catalog from off-line backups that you happen to have taken before the RUNSTATS that is now killing you. …You did do that, right?

Why a tool?

Apart from the fact that this tool is from us, my firm, just trying to “roll your own” can be a real nightmare. Why?

  • Because you must first find out all of the objects that were touched by the badly performing SQL.
  • Then you must get all of the DB2 Optimizer relevant data back from a point in time before the RUNSTATS executed, and/or the last REBIND(s),
  • and then you must flush the dynamic statement cache and REBIND any static SQL.

Sounds like a lot of work.

What else must you do?

You also have to be transparent and so log what you do. You must allow for the ability to back-out your changes as perhaps you make another access path even worse. And it would be really cool if you could do “on the fly” explains to check that the RUNSTATS really *is* the root of all that evil. Remember that ZPARMS and BUFFERPOOLS also have a major influence on access paths. Even the speed of your machine! It is also a must to then be able to go even further back in time – perhaps as much as a year?

Hang on – What about PLAN STABILITY?

Doesn’t plan stability save you? I hear you all cry. Well, “No” is the short answer! If your package is invalidated by a Schema change (the classics are index drop and recreate or VIEW change), then plan stability does not work anymore. Further, in DB2 12, Dynamic Plan Stability has been announced. Sadly it *also* fails right here as there is no SWITCH PREVIOUS/ORIGINAL support!

It all works together

So, for the static SQL case where the package is not invalided, Plan Stability is good. If not: – RUNSTATS Rescue to the rescue. For Dynamic SQL – RUNSTATS Rescue is the answer.

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell