2013-10: DB2 11 Expanded RBA/LRSN

Now if you were a good attendee at the IOD (and paid attention to the freebies!), you probably got a free copy of the great little book „DB2 11: “The Database for Big Data & Analytics”.

If you did not get a copy at the IOD you can download a free electronic copy here.

 

On page 30 there is the following text:

   “Implementation of the expanded RBA/LRSN formats is optional but highly recommended as DB2 11 will use 10-byte values internally when in NFM, and there will be a small performance penalty for converting these to old Format”

Italics added by me of course!

Now this caused a couple of questions at the IOD as how to manage a REORG of the whole world. So my newsletter this month is to suggest a way to do this without just REORGing everything from DB 0 to DB Z!

 

How to manage a REORG without just REORGing everything from DB0 to DBZ ?

1) Be on *usage* metric
Start with your “known suspects” list – Every shop has a top ten list of SQLs normally based on CPU. What you must do is change this list to not be on CPU but on *usage*. The trick is how to actually do this! Well, I would start with the average executes per hour for all objects (To get this metric you must EXPLAIN or have the EXPLAIN data). Obviously the objects at the top of this list should be REORGed first!

2) 50 Reorgs a night
Get your system automation or DB2 Database Maintenance system to use the above data to drive, say, 50 Reorgs a night *purely* based on usage metrics

3) Wait

4) You’re done when no more REORGs are triggered for this metric

Of course it would be easier if all the data was there in front of you but then you would need our SQL WorkLoadExpert (WLX) to get the usage metrics which is, of course, linked to our RealTime DBAExpert (RTDX) which takes care of the REORGs but I digress…

 

SYSCOPY’s handling

What is also “interesting” is the way SYSCOPY handles all of this… If you select the LENGTH of the START_RBA before the CATENFM job has REORGed SYSCOPY you will get 6 back. After the CATENFM has done SYSCOPY you then get 10 back. This way your “home grown” systems can “know” if the RBA is 6 or 10 Bytes…

SELECT LENGTH(START_RBA)
FROM SYSIBM.SYSCOPY 
FETCH FIRST 1 ROW ONLY 
;

RBA & LRSN extensions

The next interesting point is that a RBA has been extended on the left and a LRSN has been extended on the left *and* the right. So you need to be aware of whether or not the row was written in a data-sharing environment. Probably not that interesting to most people as their shops are one or the other and not normally a mix, but for our software we must know what is in the START_RBA so we use a little CASE to do the work for us if running in DB2 V8, 9 or 10:

CASE WHEN GROUP_MEMBER > ' ' 
     THEN 
          SUBSTR(X'00' CONCAT START_RBA CONCAT X'000000' , 1 , 10)
     ELSE 
          SUBSTR(X'00000000' CONCAT START_RBA , 1 , 10)
END

 

The last thing I will mention this month is the SYSLGRNX format has changed which might catch a few people out if you are using CI style programs to read it! The DB2 11 NFM version is now in a segmented space and has the extended trailer so beware!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-01: Complementing IBMs ACCESS PLAN Stability

 

Part 1: Dynamic SQL

with SEGs complete RUNSTATS Rescue package:
Why it is a good idea but why it fails in certain cases?

The story

This month I want to tell you a story. This story is true, but the names have been changed to protect the innocent!

A big company regularly scheduled production staging on Thursday nights. One night last year everything went as normal – until the Friday morning…

… then the telephones started ringing, and people were complaining about slow or non-existent response times for a rather critical business application.

The problem quickly escalated from the standard “Help desk”-level answers of “Switch it off and on, reboot”, and “Have you changed anything?” to senior managers demanding to know why things were not working anymore. 

At this point, the DBA group was not actually involved, as it was first thought that a “bad” package had been promoted “by accident” on the night before.

  • The production group backed out the staged packages, but it didn’t help…
  • The delays got worse. They then stopped nearly all of the WebSphere Servers to at least allow *some* work through the over loaded system. Now in full panic mode, the DBA and the JAVA teams got involved.They both quickly found the “culprit”, which was an extremely large and complex dynamic SQL statement that had worked fine until some time Thursday night, and was now behaving *very* badly indeed.
  • The DBA team REORGed the “big” tables involved, in the hope that it would then all get better…
    It didn’t.
  • Finally the DBA team proposed creating a new index which was quickly done and  RUNSTATed in production.

The SQL then switched the access path back to a good one.
The WebSphere servers were all re-started, and gradually everything returned to normal.
This whole process actually took two days!
The company involved relies upon its logistics chains and Just-In-Time delivery, so this outage had some serious repercussions, of course…    Get the full story

 

What really happened?

The DBA team then investigated further and found out that what had really happened, was that a RUNSTATS, on just one small table, had been run on the Thursday night at “an inappropriate time” thus causing the statement’s access path to go “pear-shaped”(aka Belly Up) all day Friday and half of Saturday…

 

How to fix the problem quickly and easily?

The DBA team then thought about ways that such a problem – should it reoccur in the future – could be quickly and easily fixed. Now my part of the story begins…

This company uses our software and had a license to run the Enterprise Statistics Distribution (ESD) component of Bind ImpactExpert, which extracts, and optionally converts, all of the DB2 Catalog data that the DB2 Optimizer needs to do its job. Normally, customers use this to copy all the production statistics over to a sand box style system, to see if a DB2 APAR or DB2 Migration will cause unforeseen problems. For this, they use the Early-PreCheck component of our tool Bind ImpactExpert for Dynamic or Static SQL. Now we do have another scenario, called DSC (Dynamic Statement Cache) Protection, that would nearly do what they wanted, but also does a lot more and, of course, costs more to use!
And so arose the idea for our new PocketTool called RUNSTATS Rescue. “Why is it a PocketTool?” I hear you ask “Because it only costs pocket money!” (aka Pin Money in the USA or an “allowance” if you prefer) These tools are inexpensive to use – really!

Now, before you stop reading at this point and start complaining about the fact this Newsletter is just 100% Marketing, please bear in mind that what I describe here could also be written by you – then you just need to give me credit for sharing the idea…

 

RUNSTATS Rescue

The idea is to use EXPLAIN in any way, shape or form, either in SPUFI, or directly in any monitor, to simply EXPLAIN the “culprit” SQL, and to remember the PLAN_TABLE owner you are using, as well as the QUERYNO you just used.

Using these two inputs, RUNSTATS Rescue analyses the EXPLAIN output to build a list of extract and update control cards for our ESD, for all of the tables used and *all* of the indexes – even those *not* used, of course! Finally a DSC flush RUNSTATS is also generated for all Tablespaces involved in the query to make sure that the next time this “culprit” SQL comes into the system, it will then use the correct statistics.

Now, of course, the question is: “How do I know which statistics to use as the Rescue statistics?”

The answer: “The ones that were there before you did a REORG with inline RUNSTATS or a stand-alone RUNSTATS”. This is the key point to bear in mind: You must simply run the ESD extract before any normal DB2 Database maintenance jobs run.

Most shops have days, or weekends, when they run these, and it is not a problem to extract the data and then copy it, for example, to a GENGROUP, to enable easily finding the date and time of the last extract when the statistics were “good”, thus enabling the RUNSTATS Rescue job to revert the required statistics very quickly. This gives the DBA group much-needed time to find out what really happened and take any appropriate action – almost – at their leisure.

 

10 Rescue Steps

  1. Select the new scenario RUNSTATS Rescue
  2. Generate some JCL
  3. Optionally copy to a GENGROUP dataset
  4. Insert the EXPLAIN TABLE-CREATOR and QUERYNO
  5. Automatic launch of our catalog browser
  6. Drill down to the Index level
  7. Ask “new” file name for the extracted “rescue” statistics
  8. Perform the RUNSTATS Rescue extraction
  9. Reset the statistics and executes the RUNSTATS
  10. The “Rescued” Statistics

Top of page

 

Here’s a walkthrough example of how it looks in real life:

News 2014-01 Bild1

Near the bottom, you can see the new scenario RUNSTATS Rescue – select it to get a little pop-up window with the three steps. The first step must only be done once and then simply be plugged into an existing production job. I would recommend the first job of the normal DB2 Database Maintenance job stream.

 

1 – Select the new scenario RUNSTATS Rescue in the little pop-up window with the three steps

 

2 – Generate some JCL

The first option just generates some JCL looking like:

News 2014-01 bild3

 

3 – Optionally copy to a GENGROUP dataset

At the end the optional step to copy to a GENGROUP dataset

News 2014-01 bild4

 

4 – Insert the EXPLAIN TABLE-CREATOR and QUERYNO

Selecting the second option then requests the required input data as discussed earlier

News 2014-01 bild5

 

5 – Automatic launch of our catalog browser

Hitting “enter” then launches our catalog browser to enable you to see which objects were being used by that SQL…

News 2014-01 bild6

 

6 – Drill down to the Index Level

News 2014-01 bild7

 

7 – Ask “new” file name for the extracted “rescue” statistics

PF3-ing out of the tool then asks for your original Production Statistics datasets, as extracted by the job in the first step, and a “new” file name for the extracted “rescue” statistics:

News 2014-01 bild8

 

8 – Perform the RUNSTATS Rescue extraction

The next JCL appears that performs the RUNSTATS Rescue extraction, including the optional steps for GENGROUP support, as seen here:

News 2014-01 bild9

 

9 – Reset the statistics and executes the RUNSTATS

Finally, the third option is selected, which actually resets the statistics and executes the RUNSTATS to flush the DSC

News 2014-01 bild10

 

10 – The “Rescued” Statistics

Now the next time that statement appears, it will use the “rescued” statistics and get back its old Access Path.

 

Top of page

 

Next Month

Next month I wish to expand upon this topic with the capability of doing the same for Static SQL.

The month after that, I will go into detail about the DSC Protection scenario I mentioned earlier. That is not a pocket tool, of course, but it *is* very interesting!

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-09: Bad Data Day

Bad Data Day

– Duplicates found in a migration Precheck
– The “bug”
– FIX the “bug”

 

One of my favorite topics is STATISTICS and RUNSTATS.
This month I have a short newsletter involving both of them!

Some time ago we were helping one of our customers to perform an Early Precheck (Going from DB2 9 to 10). To do so, we requested a copy of their entire DB2 production statistics so the optimizer could work here at our labs in Dusseldorf – just like at the customer site.

 

Duplicates found in a migration Precheck

We loaded up the data and I noticed something that just jumped right out…

SELECT TYPE 
      ,FREQUENCYF 
      ,NUMCOLUMNS 
      ,SUBSTR(COLVALUE , 1 , 11) AS COLVALUE 
      ,COLGROUPCOLNO 
       FROM SYSIBM.SYSCOLDIST 
       WHERE TBOWNER = 'aaaaaaa' 
         AND TBNAME  = 'bbbbbb' 
         AND NAME    = 'cccccc' 
ORDER BY 1 , 3 , 2 
; 
---------+---------+---------+---------+---------+---------+---------
TYPE             FREQUENCYF   NUMCOLUMNS   COLVALUE      COLGROUPCOLNO
---------+---------+---------+---------+---------+---------+---------
F    +0.6066539624818615E-02            1 . xxxxxxxxxx .. 
F    +0.6066539624818615E-02            1 . xxxxxxxxxx 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx .. 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx .. 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx .. 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx .. 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx .. 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx 
DSNE610I NUMBER OF ROWS DISPLAYED IS 20 
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100 

There were “duplicates”!
This is just one of many examples. Also above, you can see the problem – Notice the data in COLGROUPCOLNO? Sometimes there is a hex value and sometimes not!

 

The “bug”

Now, believe this or not, all of this goes back to a bug in RUNSTATS in DB2 V8 which got this APAR:

PK33517: COLGROUPCOLNO ASSOCIATED WITH SINGLE COLUMN CONTAINS A NUMERIC VALUE INSTEAD OF ZERO LENGTH FIELD ACCORDING TO SQL REF

What’s interesting here, is this APAR is marked as FIN so the “bug” disappeared in DB2 9!

As you can see, the bug caused an erroneous value in the COLGROUPCOLNO to be set for single column frequency rows. The bug was then fixed in DB2 9, but the “bad” data that had been inserted was not automatically cleaned up and, as I hope you all know, the SYSCOLDIST data is never automatically deleted – it is only ever updated.

So when the bug was fixed, the low-value or hexadecimal column number in that field was no longer EQUAL to a zero length field, and so an insert was done.
Since then these rows have just stayed there…

 

 

FIX the “bug”

I wrote a little query just to show the count of how many of these bad rows existed in the SYSCOLDIST:

SELECT COUNT(*) AS BAD_GUYS 
FROM SYSIBM.SYSCOLDIST 
WHERE NUMCOLUMNS                = 1 
  AND TYPE                      = 'F' 
  AND NOT LENGTH(COLGROUPCOLNO) = 0 
; 
---------+---------+---------+---------+------
    BAD_GUYS 
---------+---------+---------+---------+------
       8680 
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

 

All of these entries should be deleted from SYSCOLDIST to help the optimizer pick the right access paths!

I first noticed this problem during a test of our Statistics HealthCheck product, which flagged over 1,200 critical problems in the DB2 catalog and, thinking I had broken something, I checked all of the checks and found the above bad data.
Now is as good a time as any to check your stats *and* download our Statistics Healthcheck Freeware!!

As usual any questions or  comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-02: Complementing IBMs ACCESS PLAN Stability

 

Part 2: the Story continues for Static SQL…

The Story continues
Rescue Dynamic SQL & Static SQL
Rescue “pure” Static SQL
An errant Access path onto the good path
Next month

 

Welcome back to the second part of this newsletter.
Remember that last month I talked about a nasty real-world event that caused pain and grief to a large number of fellow DBAs?

 

Short reminder of part 1: rescue Dynamic SQL

A big company regularly scheduled production staging on Thursday nights. One night last year everything went as normal – until the Friday morning……then the telephones started ringing, and people were complaining about slow or non-existent response times for a rather critical business application…

See the previous newsletter here and the  walk through” example for Dynamic SQL rescue.

 

The Story continues for Static SQL

Well that was “only” Dynamic SQL that was killing them.

SOFTWARE ENGINEERING GmbH created the new Pocket Tool: RUNSTATS Rescue to gallop over the hill and rescue them from bad statistics timing problems.

…Everyone then lamented “But my Static SQL *also* goes horribly wrong if the statistics are badly timed. What can SEG do there?”

 

Rescue Static & Dynamic SQL

Naturally we have the ability to interface with Static SQL as well as Dynamic. Now, as all roads lead to Rome, or as I like to say “There is more than one way to skin a cat” – my co-workers don’t like that saying much for some reason… anyway, I digress – the interface to the dynamic RUNSTATS Rescue is the PLAN_TABLE owner and the QUERYNO used for the EXPLAIN. RUNSTATS Rescue does the rest –

so for static, there is a “simple” entry point. Just cut–and–paste the static SQL statement that is “causing grief” into SPUFI and EXPLAIN it, just as if it were dynamic SQL. However, the problem here, is that you must then manually generate all required REBINDs.

Top of page

Rescue “pure” Static SQL

What we decided to do, is to create a new Pocket Tool extension for “pure” Static SQL. If you only have Dynamic SQL or if you only have Static SQL, then you will only need a single licence! The data that the new extension requires is all currently available in your SQL monitor, Trace etc. which tells you who the bad guy is, namely: Collection ID, Package name, Version and Statement Id. If you have this data *and* you have done your last BIND or REBIND with EXPLAIN(YES) – and I hope that’s true for 100% of production packages! – you are ready to rock’n’roll!

 Two prepare options

So this is how it will look. First we have a new option in the Menu selection list:

News 2014-02 Bild1

 

 

 

 

 

Here you can see that we now have two Prepare options: one is for Dynamic, and is the same as in last month’s newsletter, and the other is a new Static one which then leads you to the next pop–up:

 

  A REBIND step instead of a RUNSTATS step

News 2014-02 Bild2

Here you simply enter the required data and then, after it has been verified, you generate and run the RUNSTATS Rescue job, just like last time.

However, now you do not get a RUNSTATS step at the end of the job. Instead,  you’ll get a REBIND step for either the single PACKAGE or, if REBIND ALL is Y, for all affected PACKAGES in any COLLECTION. Obviously this option must be used “with care,” however, if the statistics are bad for one package why shouldn’t they be bad for all packages?

 

An errant Access path onto the good path

That’s it!

Now you have three ways to get an errant Access Path back onto the good path. (Remember that you can *always* cut–and–paste Static SQL into an EXPLAIN SPUFI – which means there are actually two ways to do Static).

 

Top of page

Next month

Next month I will go into detail about the DSC Protection scenario from our Bind ImpactExpert tool that is *really* cool and helps deal with the same type of problems but at a much higher and fully-integrated manner. It also completely handles the tricky question of :

“OK, so this query gets better when I reset my statistics but *this* query gets worse!”

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-03 Complementing IBMs ACCESS PLAN Stability

 

Part 3: DSC Protection

In the last Newsletter of this series about protecting your assets and/or access paths, I want to tell you how SOFTWARE ENGINEERING GmbH’s Bind Impact Expert DSC Protection works.

In the  previous Newsletter of the series (2014-01 Dynamic SQL  2014-02 Static SQL), we have seen how with RUNSTATS Rescue you can quickly and cheaply get back your old DB2 Catalog Access Path Statistics.

 

This is all well and good but then come the questions:
  1. How will a RUNSTATS affect my Dynamic SQL?

  2. What happens if I reset the statistics for this statement? What are the side effects?

 

DSC Protection was created in order to handle this. In a nut shell, it will:

  1. Extract all needed DDL from production (Optional if it is already on test of course!)
  2. Extract all needed catalog statistics from production
  3. Run a “special” RUNSTATS (More on this baby later)
  4. Extract all needed catalog statistics from the _HIST tables in production
  5. Capture and Explain the DSC from production
  6. Export DSC from production
  7. Transfer all data to test
  8. Now logoff from production and logon to test
  9. Create the DDL on test (Optional)
  10. Import the dynamic SQL
  11. Update all statistics in test from the extracted _HIST data
  12. Explain all SQL from production DSC
  13. Compare the “new” Access Paths
  14. If for a given DB.TS the access is only “improved”,create a DB.TS pair ready to send back to production for actual RUNSTATS
  15. If for a given access path the result is “worsened” or “changed” then reset the statistics back to the actual production statistics and re-analyse
  16. Keep going until no more access paths are worsened or changed
  17. Transfer the RUNSTATS control cards to production
  18. Logoff from test and logon to production
  19. Run a “normal” RUNSTATS using the transferred control cards
  20. You are done! Simple, huh?

 

We start at the Main Menu:

 

Select the DSC Protection scenario:

News 2014-03 DSC bild2

 

To begin use the primary command X to start the eXport chain of jobs:

News 2014-03 DSCbild3

Once you get to the blue line,

you must then run the “special” RUNSTATS.

What this does, is run the RUNSTATS but does not invalidate the DSC. We do not want to kill the good access paths, we want to see if a RUNSTATS will help us first *before* we really run the RUNSTATS! That is why the scenario is called DSC protection after all! It is there to protect your DSC from a mistimed RUNSTATS.

 

The HISTORY option *must* be ALL or ACCESSPATH

What is very important, is that the HISTORY option *must* be ALL or ACCESSPATH. This then fills the _HIST tables with data that we need later in the analysis. Here’s another “little” problem in the fact that there is no SYSTABLESPACE_HIST table! IBM “forgot it” years ago and it has never been created… This is why there is a “prepare extract” step, as we must create a VIEW which gives us a “fake” SYSTABLESPACE_HIST table as NACTIVEF is very good for the DB2 Optimizer if you ask me!

Once all of these steps have been done and you have file transferred all the needed files across to test, you can logoff from production and log onto the test machine.

 

Top of page

 

Here you select the DSC Protection scenario again:

News 2014-03 DSCbild4

Now you use I to Import the data:

Again you just run through the jobs a step at a time; DDL create is, of course, optional. Note that here we just apply the _HIST data – not the “current” statistics but the “future” statistics. When all of these steps are done you are then ready to use primary command N for a “new” run. Submit the job and wait for it to complete.

Once it has finished use Refresh to update the display:

News 2014-03 DSCbild6

and then you can use line command S to view the statements:

News 2014-03 DSCbild7

Here you see the overview and you can drill down to the statement level:

News 2014-03 DSCbild8

Note that on both these panels there is the R command for Reset statistics that simply adds the affected objects to an internal RUNSTATS reset table which at the entry panel can also be selected with an R:

News 2014-03 DSCbild9

Top of page

It then displays another pop-up with a short list of options:

News 2014-03 DSCbild10

Using the first option just shows you a list of the internal RUNSTATS table where you can see the Production and Test names of the objects (Of course you can rename all the extracted data)
and you may choose whether or not specific DB.TS pairs should be added to or removed from the table:

News 2014-03 DSCbild11

 

Once you have finished selecting your tables the next option should then look a little familiar to you all by now!

 

It uses the extracted statistics to build an update of the production data:

When it has finished and reset various statistics, you naturally want to re-run the EXPLAINs to see if the bad paths are now all gone, or perhaps even new worse ones have appeared! In my experience this never actually happens, but just to make absolutely sure, use the next option to re-generate the JCL and re-do the analysis. Now you can start all over again or you can see that all is ok. At this point you will now have a small list of DB.TS in the internal table that would actually give you guaranteed performance improvements if run on production.

Then you select the last option to see this:

News 2014-03 DSCbild13

Top of page

And these RUNSTATS should simply be executed on Production using the normal RUNSTATS utility JCL and with UPDATE ALL or ACCESSPATH. History no longer matters of course!

Now you are done! Easy peasy lemon squeezy! (Or Easy as pie if you prefer!) That, my dear readers, is why this is *not* a pocket tool and does *not* cost pocket money (Pin money).

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-04: Are you going to PIECEs?

This month I have a really nice – and rather long – technical newsletter to sink your teeth into!

The problem: Too low internal PIECE limit in DB2

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

 Create DBD diagnose cards
Cut and paste the control cards into some DB2 Utility JCL
o “F OBDINNUM” in the REPAIR job output
Automatic parse the millions of lines of REPAIR output and *only* outputs OBIDs

 Whole SQL.txt           JCL.txt              REXX.txt

Reach your real maximum number of PIECEs!

 

The Problem:  Too low internal PIECE limit in DB2

We start with a little APAR that IBM did way back in 2013:

2013 IBM APAR

PM75150: ALLOW MAXIMUM NUMBER OF PIECES FOR AN NPI  OF LARGE TABLE SPACE, INDEPENDENT OF TABLE SPACE  MAXIMUM NUMBER OF DATASETS 

Reported release     A10
StatusCLOSED UR1
PENoPE
HIPERNoHIPER
Special AttentionNoSpecatt
Submitted date2012-10-16
Closed date2013-02-25
Last modified date2013-04-02

So far, so good. Looks like any run of the mill standard DB2 10 fix of “no real importance”… until you read the “small print” that is!

 


******************************************************************************************************************
 USERS AFFECTED: All DB2 users of Non-Partitioned Index  (NPI) on a large table space.
******************************************************************************************************************

PROBLEM DESCRIPTION:
For a large table space, the maximum   
number of pieces for an NPI is  incorrectly inherited from the
maximum number of partitions of the table space. When inserting data to a table or creating the NPI,
the customer 
may receive the following message,

DSNT501I  DMBC DSNKINSL RESOURCE UNAVAILABLE  CORRELATION-ID=QFSBTC036
CONNECTION-ID=SERVER  LUW-ID=  GAF263F9.C741.120818101839=561408
REASON 00C9009D
TYPE 00000201                          

****************************************************************************************************************
RECOMMENDATION:
****************************************************************************************************************
For an NPI on a LARGE(5-byte RID) table space, the index maximum number of pieces is independent of the table space maximum number of partitions, it is calculated from index page size and piece size as the following:

Maximum NPI number of pieces = MINIMUM( 4096, 2^32 / (index piece size/index page size) )


 

In other words, DB2 set the internal piece limit way too low!
This is very bad news if you think you have 4,096 pieces but you really have only 256!

Now the APAR carries on with helpful hints and tips (Quoting here from the APAR text):

 

Top of page

 

1. run a query, such as the sample query below, to identify the databases, table spaces and indexes,

SELECT TS.DBNAME AS DB_NAME,
       TS.NAME AS TS_NAME,
       IX.NAME AS INDEX_NAME,
       IX.CREATOR AS IX_CREATOR,
       HEX(IX.OBID) AS INDEX_OBID,
       IX.CLOSERULE AS CLOSERULE
FROM   SYSIBM.SYSINDEXES IX,
       SYSIBM.SYSTABLES TB,
       SYSIBM.SYSTABLESPACE TS
WHERE  TB.NAME = IX.TBNAME AND
       TB.CREATOR = IX.TBCREATOR AND
       TB.TSNAME = TS.NAME AND
       TB.DBNAME = TS.DBNAME AND
       (TS.PARTITIONS > 254 OR
        TS.TYPE = 'L' OR
        TS.TYPE = 'K' OR
        TS.DSSIZE > 0) AND
        TS.PARTITIONS <> 0 AND
        (TS.STATUS = 'C' OR
        NOT ((IX.INDEXTYPE = 'P' OR
        IX.INDEXTYPE = 'D') AND
        TB.PARTKEYCOLNUM <>0)) AND
        TS.DBNAME <> 'DSNDB06'
ORDER BY IX.OBID;
+-------------------------------------------------------------+
|DB_NAME |TS_NAME |INDEX_NAME|IX_CREATOR|INDEX_OBID|CLOSERULE |
+-------------------------------------------------------------+
|DB161410|XTB10000|I_NODEIDXT|SC161410  |0056      |N         |
|DB161410|TU161410|I_DOCIDTB1|SC161410  |0058      |N         |
 ...
 
2. run REPAIR DBD DIAGNOSE against all databases from the above query.
 
For example REPAIR DBD DB161410. By searching OBDINNUM in the REPAIR DBD output, it has the following mismatch,
 
DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS
 
which confirms the index index I_NODEIDXT with OBID X'0056' has incorrect maximum number of index pieces.
 
3. fix these indexes found in the above repair DBD output by altering the CLOSE attribute of the identified indexes from their current value and then altering the CLOSE attribute back to their original value. The index name and index attribute information can be found in the query result at step 1.
 
for example, the index SC161410.I_NODEIDXT has the CLOSE attribute CLOSERULE = N. To correct the maximum number of index pieces for I_NODEIDXT, run the following DDL
 
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE YES
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE NO

 

So what I decided to do, was to create a set of queries for my newsletter readers to make this process a tad easier to really find the bad guys!

 

Top of page

 

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Whole SQL.txt           JCL.txt              REXX.txt

 

Create DBD diagnose cards

First step is nearly the same SQL as in 1. Above:

-- CREATE DBD DIAGNOSE CARDS FOR ANY DATABASES THAT ARE CANDIDATES  
SELECT DISTINCT SUBSTR('  REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE '
CONCAT STRIP(TS.DBNAME) , 1 , 54 )              
FROM SYSIBM.SYSINDEXES    IX,                                        
     SYSIBM.SYSTABLES     TB,
     SYSIBM.SYSTABLESPACE TS                                       
WHERE TB.NAME    = IX.TBNAME                                         
   AND TB.CREATOR = IX.TBCREATOR                                      
   AND TB.TSNAME  = TS.NAME                                           
   AND TB.DBNAME  = TS.DBNAME                                         
   AND (TS.PARTITIONS > 254                                           
     OR TS.TYPE       = 'L'                                           
     OR TS.TYPE       = 'K'                                           
     OR TS.DSSIZE     > 0)                                            
   AND TS.PARTITIONS <> 0                                             
   AND (TS.STATUS = 'C'                                               
     OR NOT ((IX.INDEXTYPE = 'P'                                      
           OR IX.INDEXTYPE = 'D')                                     
        AND TB.PARTKEYCOLNUM <> 0))                                  
   AND TS.DBNAME <> 'DSNDB06'                                         

;

It returns this style of output (If you have *no* output then congratulations you have *no* problem!):

REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE A140XO82
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB1  
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB3
.
.
.

Top of page

 

Cut and paste these control cards into some DB2 Utility JCL

Now you must cut and paste these control cards into some DB2 Utility JCL and get them all run.

The next query lists out all the index data you need from the above databases:


-- NOW LIST INDEX OBID'S ON THE ABOVE DATABASES THAT MUST BE SEARCHED
-- FOR IN THE REPAIR SYSOUT BY USING "F OBDINNUM" AND CHECKING IF THE
-- OBID'S MATCH                                                 
SELECT SUBSTR(TS.DBNAME , 1 , 8)       AS DB_NAME,                
       HEX(IX.OBID)                    AS INDEX_OBID,                
       SUBSTR(STRIP(IX.CREATOR) CONCAT '.' CONCAT                    
              STRIP(IX.NAME) , 1 , 71) AS INDEX                       
       FROM SYSIBM.SYSINDEXES    IX,                                      
       SYSIBM.SYSTABLES     TB,                                      
       SYSIBM.SYSTABLESPACE TS                                       
  WHERE TB.NAME    = IX.TBNAME                                       
    AND TB.CREATOR = IX.TBCREATOR                                    
    AND TB.TSNAME  = TS.NAME                                         
    AND TB.DBNAME  = TS.DBNAME                                       
    AND (TS.PARTITIONS > 254                                         
      OR TS.TYPE       = 'L'                                         
      OR TS.TYPE       = 'K'                                         
      OR TS.DSSIZE     > 0)                                          
    AND TS.PARTITIONS <> 0                                           
    AND (TS.STATUS = 'C'                                             
      OR NOT ((IX.INDEXTYPE = 'P'                                    
            OR IX.INDEXTYPE = 'D')                                   
          AND TB.PARTKEYCOLNUM <> 0))                                
    AND TS.DBNAME <> 'DSNDB06'                                       
  ORDER BY 1 , 2 
;

The output looks like:

---------+---------+---------+---------+----------
 DB_NAME   INDEX_OBID  INDEX 
---------+---------+---------+---------+---------
 A140XO82  000C        USER001.RESB~0
 ANKDB1    000A        ANKDB1.ANKIX11
 ANKDB3    0008        ANKDB3.ANKIX3 . . .

Do “F OBDINNUM” in the REPAIR job output

Now comes the really horrible bit – you must now do “F OBDINNUM” in the REPAIR job output and see if you have any matches for the above indexes in the relevant database. Again, if you have none then congratulations, you have no problem!

If you find data like this (just as in the IBM APAR description):

DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS

Top of page

 

then you must now do the next step to generate the corrective ALTER flip-flops:

-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- NOW GENERATE CORRECTIVE CLOSE FLIPS ONE FOR NO -> YES           
-- AND ONE FOR YES -> NO                                           
-- NO -> YES CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'N'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE YES $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE NO  $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;                                

Which generates the first set of flip-flops from YES -> NO:

ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE NO  $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE NO  $      
And then the next flip-flop:
-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- YES -> NO CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'Y'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE NO  $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE YES $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;
Which generates NO -> YES format:
ALTER INDEX                               
USER001.RESB~0                            
CLOSE NO  $                               
ALTER INDEX                               
USER001.RESB~0                            
CLOSE YES $  

 

Top of page

 

Note that the above ALTERs use a $ as terminator, so you must either use a C ALL $ ; style ISPF command or just use a –#SET TERMINATOR $ line in the SPUFI.

Automatic parse of the millions of lines of REPAIR output and *only* outputs OBIDs

At this point, when even I had a HUGE list of candidates, I decided to automate it even further so I wrote a little REXX that parses the millions of lines of REPAIR output and *only* outputs OBIDs for real bad guys. That way you can see what you really have to do very quickly. Here’s how my output looks:

Began checking at: 2014-02-11-12.34.08.752641
Database A140XO82 started.                   
Database A140XO82 ended.                     
Database ANKDB1   started.                   
Database ANKDB1   ended.                     
Database ANKDB3   started.                   
Database ANKDB3   ended.                     
Database D9999TOP started.                   
OBID X'0004' is X'0100' rebuilt X'1000'      
OBID X'0006' is X'0100' rebuilt X'1000'      
OBID X'0008' is X'0100' rebuilt X'1000'      
OBID X'000A' is X'0100' rebuilt X'1000'      
OBID X'000C' is X'0100' rebuilt X'1000'      
OBID X'000E' is X'0100' rebuilt X'1000'      
OBID X'0010' is X'0100' rebuilt X'1000'      
Database D9999TOP ended.                     
Database DATAPBG2 started.                   
Database DATAPBG2 ended.                     
.
.
.

Here you can see that my first actual problem database is the D9999TOP. When I then plug that into my flip flop generator by using cut-and-paste, for both cases, like this:

AND TB.DBNAME  = 'D9999TOP' 
  AND HEX(IX.OBID) IN (       
                  '0004'      
                , '0006'      
                , '0008'      
                , '000A'      
                , '000C'      
                , '000E'      
                , '0010'      
                 )   

I get the following output:

ALTER INDEX      
D9999TOP.BADBIGIA
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIA 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE NO  $

Top of page

 

Reach your real maximum number of pieces!

There was no output from the other query, so once these ALTERs ran my work on this database was done and I could happily move onto the next.

This is much quicker than searching through REPAIR sysout! The REXX should be copied to a PROC library and called NPI. Then using the JCL (which must tailored for your site, of course!) to execute it – and Bob’s your uncle!

Once all of the ALTERs have been executed then, finally, all of your NPIs can actually reach their real maximum number of pieces!
This month I also used (stole!) the neat Cartesian Join idea from Marcus Davage’s blog: http://spufidoo.wordpress.com/2013/02/28/judicious-cartesian-joins/

that was posted on Listserv and everyone loved it instantly!

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-05: Why SIZE matters for DB2

“Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

IFCIDs  : Instant monitoring of secondary allocations

DB2 V7 : Space management without “the IBM algorithm”
DB2 V8 : Running out of datasets & Degenerated extents
DB2 9    : UTS spaces introduction & PBG’s problem
DB2 10  : ALTER PBGs
DB2 11  : Remove any empty parts in PBGs

SAX       : Space management processing

Since DB2 V8, the DBAs of the world have all had the ability to forget about PQTY and SQTY in the DDL for Tablespaces and Indexes.

At first, nearly no-one trusted the sliding scale algorithm, and SOFTWARE ENGINEERING’s product Space AssuranceExpert (aka SAX) monitored and reacted instantly to secondary allocations.

However, we now have DB2 11 and I thought it would be interesting to review what was done in DB2 V7 (when our SAX was launched), and the difference nowadays in the DB2 11 world.

IFCIDs: Instant monitoring of secondary allocations

In DB2, every time a secondary allocation is done, DB2 can be made to spit out an IFCID. SAX runs as a started task, active 24×7, from DB2 start up until just before DB2 shut down.

It catches all of these IFCIDs thrown by DB2 and performs an analysis with five basic questions:

  1. Can this dataset reach its maximum physical size *before* running out of physical extents? (The actual size is dependent on the “geometry” of the object of course!)
  2. Will this object run out of datasets? (The number of datasets an object can have is, once again, dependent on the “geometry” of the object)
  3. Is this partition nearing its maximum size?
  4. Did DB2 ask for one extent but got more back?
  5. Are any of my SMS disk storage pools running out of space?

DB2 V7 : Space management without “the IBM algorithm”

Remember, back in those old days of DB2 V7, we only had 255 extents and 254 partitions, but datasets could still get pretty big pretty fast.

The problem lots of shops had, was that an important dataset would “hit the buffers” of maximum number of extents *way* before it ever ran out of physical space. Thus causing grief, wailing and gnashing of teeth!

SAX stopped all this by giving WTO “heads-up” style messages in two flavors.

– First a warning message,
– and then a critical message.

This gave DBAs and space managers much needed time to plan for the outage and the, inevitably, long running REORG to actually action the required ALTER, or perhaps even any DROP/RECREATE that had to be done.

DB2 V8 : Running out of datasets & Degenerated extents

IBM also noticed this problem and so introduced in DB2 V8 the “sliding scale” of secondary allocations, as long as the ZPARM  OPTIMIZE EXTENT SIZING field (MGEXTSZ) was set to YES (this is the default from DB2 9, by the way).

Of course, to really use this, you then had to ALTER all of the existing spaces PQTY and SQTY to be -1, and then remember to delete all PRIQTY and SECQTY lines in your DDL and also rely on the TSQTY and IXQTY ZPARMs giving a big enough “first default”. (By the way, defaults for these two ZPARMS are 0, which is actually translated to be 720k or one cylinder for normal spaces and 7200k or 10 cylinders for LOB spaces). This all probably explains why the take up of this great feature has not been that spectacular and, in fact, Listserv *still* gets questions about “How good is this feature?”

This also explains why the primary reason for having SAX is still valid at most shops today!

Running out of datasets

Now the problem of running out of datasets is very, very evil indeed… For a non-partitioned space, you can have up to 32 datasets, and DB2 will happily allocate away and you will never know, or even be informed, if, and when, the last possible dataset has just been allocated and, of course, you will not know that the 33rd one cannot be allocated until you get a -904 unavailable resource error!

By definition this is “not good”, as you must do a HUGE REORG with a bunch of managers breathing down your neck and *not* make any mistakes with the new allocations. (Again, this is a very good reason to have SAX doing all the monitoring and triggering early warning “heads-up” style messages!)

A partition running out of space is rare, but when it does happen it is, of course, a disaster! The idea in SAX, is to warn when the end is near for any of the partitions in use and thus, as before, allow time for the ALTER etc.

Top of page

Degenerated extents

Degenerated extents are annoying as well. You have 255 extents, DB2 requests one but gets up to five back! This is “wasting” your precious supply of extents and so SAX can also warn you if this starts happening, and remedial action can again be planned to correct the problem, (normally a volume defrag in this case).

Now in z/OS 1.7 “Extent Constraint Removal” was introduced for the DATACLAS which, if set to “Y”, allows 7,257 extents but still limits you to 123 extents per volume and 59 volumes, so watch out if you are using huge “virtual” disks (E.g. MOD 54 or EAV), as you can end up wasting space as you still cannot exceed 123 extents per volume.

Finally, when an SMS Pool runs out of space, either for sort/work or image copy, it is *not* good! The idea here, is to also give a “heads-up” style alert so that the DBA can trigger the space management people to have a look at the state and size of the SMS storage groups this time alerted by percentage used or GBs of space free.

DB2 V8 introduced a big change – Partitions went up to a maximum of 4,096, and the calculation about how many pieces your NPI can have got “a little bit complex” (see also my previous newsletter: “Are you going to PIECEs” here!).

DB2 9: UTS spaces introduction & PBG’s problem

In DB2 9 the next major advance came with UTS spaces, and the one that caused the most grief was, of course, PBG. Why?

Well, the first problem was that some people went mad and used MAXPARTITIONS 4096 right from the get-go which they then found out could not simply be changed and ended up being a huge problem.

IBM came out with a bunch of fixes for these people, but the recommendation is still true today: “Use the number you expect to use!”

PBGs, however, came with a new set of space management Problems:

  1. By definition every partition is full, and so a TP REORG is “dangerous” -especially if you have VARCHAR, and even more so if compressed.
  2. ALTER at TP level is not supported for PBG.
  3. Getting rid of empty partitions was not supported.
  4. Adding partitions dynamically (by command) was not supported.
  5. What to do if the partition that is “in use” is growing and is
  6. The last allowed partition
  7. MAXPARTITIONS is set to one?

Now these are “non trivial” because the DB2 catalog is so defined and you would not want an alert every time someone created a table or index!
The trick here, is to treat these conditions as if it was a normal space and so, instead of warning that you are using the last part, it waits until you are using e.g. 80% of that part and then e.g. at 90% comes the critical threshold warning.

 

DB2 10: ALTER PBGs

With DB2 10 came the ability to ALTER PBGs to add parts which made using DSN1COPY to clone data around a lot better!

DB2 11: Remove any empty parts in PBGs

Further, in DB2 11, the REORG utility can be used to remove any empty parts in PBGs by the use of the ZPARM REORG_DROP_PBG_PARTS being set to ENABLE (DISABLE is the default).

Top of page

Space Management processing

The SAX tool way of processing all this info is neatly summarised in the help panel of the tool itself:

AUDIT SMS STOGROUPS
Should the Space AssuranceExpert audit SMS stogroups. Y/N
If Y is entered, a pop-up window will allow you to enter up
to 24 SMS storage groups which will be audited.
If WARN IF % ALLOC > or WARN IF GB FREE < is specified and
exceeded, a warning (WTO) will be issued.

AUDIT DEGENERATED XTS
Audit secondary quantity for de-generated extents. If
specified, a warning is issued in case of the last extent
does not reach this percentage of the SECQTY specified
in the DB2 catalog. If this field is left blank, no
auditing is performed.

SUPERVISE LPS
Supervise linear pagesets. If specified, a warning is issued
in case of high allocated reaches this percentage of the
maximum data set size for partitioned objects.

For non-partitioned objects, a warning is issued for every
newly allocated data set as soon as the data set number
reaches this percentage of the maximum number of data sets:

Two different values may be entered for warning and critical
values with different message ids. This may be useful for
automation reasons (see below).

Object type: TABLESPACE      ! Maximum number of data sets
-----------------------------+----------------------------
LOB tablespaces              ! 254
-----------------------------+----------------------------
Non-partitioned tablespaces  ! 32
-----------------------------+----------------------------
Partitioned tablespaces      ! 1 (Percent used check)
-----------------------------+----------------------------
Partitioned By Growth        ! MAXPARTITIONS. LPS check if
tablespaces                  ! more than one. If on last
                             ! partition then percent used.
-----------------------------+----------------------------

Object type: INDEX           ! Maximum number of data sets
-----------------------------+----------------------------
Non-partitioned indexes on   ! MIN ( 4096 , 2 power 32 /
tablespace with LARGE,       ! ( DSSIZE / TS PGSIZE))
DSSIZE, or more than 64      ! Eg: 128 GB DSSIZE with
Partitions                   ! 8 KB Tablespace Page
                             ! gives 256 Pieces (datasets)
                             ! Or 4 GB DSSIZE with
                             ! 4 KB Tablespace Page
                             ! gives 4096 Pieces (datasets)
-----------------------------+----------------------------
Non-partitioned indexes      ! 32
otherwise                    !
-----------------------------+----------------------------
Partitioned indexes          ! 1 (Percent used check)
-----------------------------+----------------------------

To support automation based on WTO ids two different
thresholds may be specified:
Field (1) specifies a warning threshold using WTO ids
O2RTSU04 - 12W (non-partitioned spaces)
O2RTSU04 - 14W (partitioned spaces)
O2RTSU04 - 16W (partition by growth spaces)
Field (2) specifies a critical threshold using WTO ids
O2RTSU04 - 13W (non-partitioned spaces)
O2RTSU04 - 15W (partitioned spaces)
O2RTSU04 - 17W (partition by growth spaces)

So now you know why size matters for DB2!

The big question now is:
“Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

Top of page

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2012-12: APAR list reprint “Back by popular demand”

A few years ago,
I started keeping a little list of APARs that documented the “interesting” problems in DB2,
so that I did not have to go through hours of searching every month or two.

Over time it evolved into three distinct groups of corrections.

  • RTS:

For any fixes relevant to Real-Time Statistics (As one of our prime products RealTimeDatabaseExpert relies on this data it behooves me to make sure it is as valid and good as can be!)

  • RUNSTATS:

As the age old adage “garbage in = garbage out” is still correct when it comes to the DB2 Optimizer. Over the years, it has become very clever and at the same time very sensitive to dodgy data or correlations or, heaven forbid, manual well-minded but incorrect updates (To help here : we even wrote our own freeware tool Statistics HealthCheck to find the bad data – before it bites you!)

  • SQL Performance group:

Here all of the tweaks and bugs of the Optimizer usually surface, but I have had to expand its remit over the years to now also include SQL Access Path and SQL Stats problems as these groups then cover all possible interesting things!

 

APAR list what’s new?

I have also changed how I store the data as it used to be in a PDS member for DB2 V6 and is now in three Excel spreadsheets for DB2 V8, 9 and 10. Soon I will drop V8 and add Vnext as V8 has now “gone” from the APAR point of view. I have also added useful columns like “last modified” so you can see if the APAR has moved since you last blinked, and the latest addition is a HIPER column. It used to be part of the description but could easily get lost and so now there is a Y for HIPER or Blank for normal.

So every month I trawl through the tech support database of DB2 z/OS and look for changed, updated, new, or closed APARs and then get all the data, pack it into my Excels and then mail it out to anyone who wants it. Now I know you can subscribe to lots of services that send out email alerts but I still find three Excels or comma separated files a *lot* easier to work with than an email and it is DB2 version self-documenting. No need to follow links to the bitter end of the IBM web site!

As a final thought on this topic, I also write some notes about which APARs I think are of real interest including HIPERs etc.

 

Get the three APAR lists:

So here is how the last set of data looked. The columns are always the same:

APAR, CLOSED date, STATUS, V8 PTF, 9 PTF, 10 PTF, HIPER, Description

The Status column can be blank (nothing to report), New, Closed, New & Closed (If IBM is quick!), Modified (If a closed APAR has been changed by IBM) or Fixed in Next (that you do not want to see!)

The PTF columns are either the PTF number, N/A for not applicable for this release or OPEN for, guess what?, open.

 

First is the RTS list:

2012-12_RTS list 

Now the RUNSTATS list:

2012-12_RUNSTATS list

 

Here you can see one HIPER – and note that any typos come from IBM not me J(INDFINITELY…).

 

And finally the SQL Performance list:

2012-12_SQL Performance

 

If you are interested in getting these updates simply email me or techsupport@seg.de and ask!

As usual comments and queries are always welcome!

 

TTFN,
Roy Boxwell
Senior Architect