2014-02: Complementing IBMs ACCESS PLAN Stability

 

Part 2: the Story continues for Static SQL…

 

Welcome back to the second part of this newsletter.
Remember that last month I talked about a nasty real-world event that caused pain and grief to a large number of fellow DBAs?

 

Short reminder of part 1: rescue Dynamic SQL

A big company regularly scheduled production staging on Thursday nights. One night last year everything went as normal – until the Friday morning……then the telephones started ringing, and people were complaining about slow or non-existent response times for a rather critical business application…

See the previous newsletter here and the  walk through” example for Dynamic SQL rescue.

 

The Story continues for Static SQL

Well that was “only” Dynamic SQL that was killing them.

SOFTWARE ENGINEERING GmbH created the new Pocket Tool: RUNSTATS Rescue to gallop over the hill and rescue them from bad statistics timing problems.

…Everyone then lamented “But my Static SQL *also* goes horribly wrong if the statistics are badly timed. What can SEG do there?“

 

Rescue Static & Dynamic SQL

Naturally we have the ability to interface with Static SQL as well as Dynamic. Now, as all roads lead to Rome, or as I like to say “There is more than one way to skin a cat” – my co-workers don’t like that saying much for some reason… anyway, I digress – the interface to the dynamic RUNSTATS Rescue is the PLAN_TABLE owner and the QUERYNO used for the EXPLAIN. RUNSTATS Rescue does the rest –

so for static, there is a “simple” entry point. Just cut–and–paste the static SQL statement that is “causing grief” into SPUFI and EXPLAIN it, just as if it were dynamic SQL. However, the problem here, is that you must then manually generate all required REBINDs.

Top of page

Rescue „pure“ Static SQL

What we decided to do, is to create a new Pocket Tool extension for “pure” Static SQL. If you only have Dynamic SQL or if you only have Static SQL, then you will only need a single licence! The data that the new extension requires is all currently available in your SQL monitor, Trace etc. which tells you who the bad guy is, namely: Collection ID, Package name, Version and Statement Id. If you have this data *and* you have done your last BIND or REBIND with EXPLAIN(YES) – and I hope that’s true for 100% of production packages! – you are ready to rock’n’roll!

 Two prepare options

So this is how it will look. First we have a new option in the Menu selection list:

News 2014-02 Bild1

 

 

 

 

 

Here you can see that we now have two Prepare options: one is for Dynamic, and is the same as in last month’s newsletter, and the other is a new Static one which then leads you to the next pop–up:

 

  A REBIND step instead of a RUNSTATS step

News 2014-02 Bild2

Here you simply enter the required data and then, after it has been verified, you generate and run the RUNSTATS Rescue job, just like last time.

However, now you do not get a RUNSTATS step at the end of the job. Instead,  you’ll get a REBIND step for either the single PACKAGE or, if REBIND ALL is Y, for all affected PACKAGES in any COLLECTION. Obviously this option must be used “with care,” however, if the statistics are bad for one package why shouldn’t they be bad for all packages?

 

An errant Access path onto the good path

That’s it!

Now you have three ways to get an errant Access Path back onto the good path. (Remember that you can *always* cut–and–paste Static SQL into an EXPLAIN SPUFI – which means there are actually two ways to do Static).

 

Top of page

Next month

Next month I will go into detail about the DSC Protection scenario from our Bind ImpactExpert tool that is *really* cool and helps deal with the same type of problems but at a much higher and fully-integrated manner. It also completely handles the tricky question of :

“OK, so this query gets better when I reset my statistics but *this* query gets worse!”

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-03 Complementing IBMs ACCESS PLAN Stability

 

Part 3: DSC Protection

In the last Newsletter of this series about protecting your assets and/or access paths, I want to tell you how SOFTWARE ENGINEERING GmbH’s Bind Impact Expert DSC Protection works.

In the  previous Newsletter of the series (2014-01 Dynamic SQL  2014-02 Static SQL), we have seen how with RUNSTATS Rescue you can quickly and cheaply get back your old DB2 Catalog Access Path Statistics.

 

This is all well and good but then come the questions:
  1. How will a RUNSTATS affect my Dynamic SQL?

  2. What happens if I reset the statistics for this statement? What are the side effects?

 

DSC Protection was created in order to handle this. In a nut shell, it will:

  1. Extract all needed DDL from production (Optional if it is already on test of course!)
  2. Extract all needed catalog statistics from production
  3. Run a “special” RUNSTATS (More on this baby later)
  4. Extract all needed catalog statistics from the _HIST tables in production
  5. Capture and Explain the DSC from production
  6. Export DSC from production
  7. Transfer all data to test
  8. Now logoff from production and logon to test
  9. Create the DDL on test (Optional)
  10. Import the dynamic SQL
  11. Update all statistics in test from the extracted _HIST data
  12. Explain all SQL from production DSC
  13. Compare the “new” Access Paths
  14. If for a given DB.TS the access is only “improved”,create a DB.TS pair ready to send back to production for actual RUNSTATS
  15. If for a given access path the result is “worsened” or “changed” then reset the statistics back to the actual production statistics and re-analyse
  16. Keep going until no more access paths are worsened or changed
  17. Transfer the RUNSTATS control cards to production
  18. Logoff from test and logon to production
  19. Run a “normal” RUNSTATS using the transferred control cards
  20. You are done! Simple, huh?

 

We start at the Main Menu:

 

Select the DSC Protection scenario:

News 2014-03 DSC bild2

 

To begin use the primary command X to start the eXport chain of jobs:

News 2014-03 DSCbild3

Once you get to the blue line,

you must then run the “special” RUNSTATS.

What this does, is run the RUNSTATS but does not invalidate the DSC. We do not want to kill the good access paths, we want to see if a RUNSTATS will help us first *before* we really run the RUNSTATS! That is why the scenario is called DSC protection after all! It is there to protect your DSC from a mistimed RUNSTATS.

 

The HISTORY option *must* be ALL or ACCESSPATH

What is very important, is that the HISTORY option *must* be ALL or ACCESSPATH. This then fills the _HIST tables with data that we need later in the analysis. Here’s another “little” problem in the fact that there is no SYSTABLESPACE_HIST table! IBM “forgot it” years ago and it has never been created… This is why there is a “prepare extract” step, as we must create a VIEW which gives us a “fake” SYSTABLESPACE_HIST table as NACTIVEF is very good for the DB2 Optimizer if you ask me!

Once all of these steps have been done and you have file transferred all the needed files across to test, you can logoff from production and log onto the test machine.

 

Top of page

 

Here you select the DSC Protection scenario again:

News 2014-03 DSCbild4

Now you use I to Import the data:

Again you just run through the jobs a step at a time; DDL create is, of course, optional. Note that here we just apply the _HIST data – not the “current” statistics but the “future” statistics. When all of these steps are done you are then ready to use primary command N for a “new” run. Submit the job and wait for it to complete.

Once it has finished use Refresh to update the display:

News 2014-03 DSCbild6

and then you can use line command S to view the statements:

News 2014-03 DSCbild7

Here you see the overview and you can drill down to the statement level:

News 2014-03 DSCbild8

Note that on both these panels there is the R command for Reset statistics that simply adds the affected objects to an internal RUNSTATS reset table which at the entry panel can also be selected with an R:

News 2014-03 DSCbild9

Top of page

It then displays another pop-up with a short list of options:

News 2014-03 DSCbild10

Using the first option just shows you a list of the internal RUNSTATS table where you can see the Production and Test names of the objects (Of course you can rename all the extracted data)
and you may choose whether or not specific DB.TS pairs should be added to or removed from the table:

News 2014-03 DSCbild11

 

Once you have finished selecting your tables the next option should then look a little familiar to you all by now!

 

It uses the extracted statistics to build an update of the production data:

When it has finished and reset various statistics, you naturally want to re-run the EXPLAINs to see if the bad paths are now all gone, or perhaps even new worse ones have appeared! In my experience this never actually happens, but just to make absolutely sure, use the next option to re-generate the JCL and re-do the analysis. Now you can start all over again or you can see that all is ok. At this point you will now have a small list of DB.TS in the internal table that would actually give you guaranteed performance improvements if run on production.

Then you select the last option to see this:

News 2014-03 DSCbild13

Top of page

And these RUNSTATS should simply be executed on Production using the normal RUNSTATS utility JCL and with UPDATE ALL or ACCESSPATH. History no longer matters of course!

Now you are done! Easy peasy lemon squeezy! (Or Easy as pie if you prefer!) That, my dear readers, is why this is *not* a pocket tool and does *not* cost pocket money (Pin money).

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-04: Are you going to PIECEs?

This month I have a really nice – and rather long – technical newsletter to sink your teeth into!

The problem: Too low internal PIECE limit in DB2

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Create DBD diagnose cards
– Cut and paste the control cards into some DB2 Utility JCL
Do “F OBDINNUM” in the REPAIR job output
– Automatic parse the millions of lines of REPAIR output and *only* outputs OBIDs

 Whole SQL.txt           JCL.txt              REXX.txt

Reach your real maximum number of PIECEs!

The Problem:  Too low internal PIECE limit in DB2

We start with a little APAR that IBM did way back in 2013:

2013 IBM APAR

PM75150: ALLOW MAXIMUM NUMBER OF PIECES FOR AN NPI  OF LARGE TABLE SPACE, INDEPENDENT OF TABLE SPACE  MAXIMUM NUMBER OF DATASETS 

Reported release     A10
StatusCLOSED UR1
PENoPE
HIPERNoHIPER
Special AttentionNoSpecatt
Submitted date2012-10-16
Closed date2013-02-25
Last modified date2013-04-02

So far, so good. Looks like any run of the mill standard DB2 10 fix of “no real importance”… until you read the “small print” that is!

 


******************************************************************************************************************
 USERS AFFECTED: All DB2 users of Non-Partitioned Index  (NPI) on a large table space.
******************************************************************************************************************

PROBLEM DESCRIPTION:
For a large table space, the maximum   
number of pieces for an NPI is  incorrectly inherited from the
maximum number of partitions of the table space. When inserting data to a table or creating the NPI,
the customer 
may receive the following message,

DSNT501I  DMBC DSNKINSL RESOURCE UNAVAILABLE  CORRELATION-ID=QFSBTC036
CONNECTION-ID=SERVER  LUW-ID=  GAF263F9.C741.120818101839=561408
REASON 00C9009D
TYPE 00000201                          

****************************************************************************************************************
RECOMMENDATION:
****************************************************************************************************************
For an NPI on a LARGE(5-byte RID) table space, the index maximum number of pieces is independent of the table space maximum number of partitions, it is calculated from index page size and piece size as the following:

Maximum NPI number of pieces = MINIMUM( 4096, 2^32 / (index piece size/index page size) )


 

In other words, DB2 set the internal piece limit way too low!
This is very bad news if you think you have 4,096 pieces but you really have only 256!

Now the APAR carries on with helpful hints and tips (Quoting here from the APAR text):

 

Top of page

 

1. run a query, such as the sample query below, to identify the databases, table spaces and indexes,

SELECT TS.DBNAME AS DB_NAME,
       TS.NAME AS TS_NAME,
       IX.NAME AS INDEX_NAME,
       IX.CREATOR AS IX_CREATOR,
       HEX(IX.OBID) AS INDEX_OBID,
       IX.CLOSERULE AS CLOSERULE
FROM   SYSIBM.SYSINDEXES IX,
       SYSIBM.SYSTABLES TB,
       SYSIBM.SYSTABLESPACE TS
WHERE  TB.NAME = IX.TBNAME AND
       TB.CREATOR = IX.TBCREATOR AND
       TB.TSNAME = TS.NAME AND
       TB.DBNAME = TS.DBNAME AND
       (TS.PARTITIONS > 254 OR
        TS.TYPE = 'L' OR
        TS.TYPE = 'K' OR
        TS.DSSIZE > 0) AND
        TS.PARTITIONS <> 0 AND
        (TS.STATUS = 'C' OR
        NOT ((IX.INDEXTYPE = 'P' OR
        IX.INDEXTYPE = 'D') AND
        TB.PARTKEYCOLNUM <>0)) AND
        TS.DBNAME <> 'DSNDB06'
ORDER BY IX.OBID;
+-------------------------------------------------------------+
|DB_NAME |TS_NAME |INDEX_NAME|IX_CREATOR|INDEX_OBID|CLOSERULE |
+-------------------------------------------------------------+
|DB161410|XTB10000|I_NODEIDXT|SC161410  |0056      |N         |
|DB161410|TU161410|I_DOCIDTB1|SC161410  |0058      |N         |
 ...
 
2. run REPAIR DBD DIAGNOSE against all databases from the above query.
 
For example REPAIR DBD DB161410. By searching OBDINNUM in the REPAIR DBD output, it has the following mismatch,
 
DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS
 
which confirms the index index I_NODEIDXT with OBID X'0056' has incorrect maximum number of index pieces.
 
3. fix these indexes found in the above repair DBD output by altering the CLOSE attribute of the identified indexes from their current value and then altering the CLOSE attribute back to their original value. The index name and index attribute information can be found in the query result at step 1.
 
for example, the index SC161410.I_NODEIDXT has the CLOSE attribute CLOSERULE = N. To correct the maximum number of index pieces for I_NODEIDXT, run the following DDL
 
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE YES
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE NO

 

So what I decided to do, was to create a set of queries for my newsletter readers to make this process a tad easier to really find the bad guys!

 

Top of page

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Whole SQL.txt           JCL.txt              REXX.txt

 

Create DBD diagnose cards

First step is nearly the same SQL as in 1. Above:

-- CREATE DBD DIAGNOSE CARDS FOR ANY DATABASES THAT ARE CANDIDATES  
SELECT DISTINCT SUBSTR('  REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE '
CONCAT STRIP(TS.DBNAME) , 1 , 54 )              
FROM SYSIBM.SYSINDEXES    IX,                                        
     SYSIBM.SYSTABLES     TB,
     SYSIBM.SYSTABLESPACE TS                                       
WHERE TB.NAME    = IX.TBNAME                                         
   AND TB.CREATOR = IX.TBCREATOR                                      
   AND TB.TSNAME  = TS.NAME                                           
   AND TB.DBNAME  = TS.DBNAME                                         
   AND (TS.PARTITIONS > 254                                           
     OR TS.TYPE       = 'L'                                           
     OR TS.TYPE       = 'K'                                           
     OR TS.DSSIZE     > 0)                                            
   AND TS.PARTITIONS <> 0                                             
   AND (TS.STATUS = 'C'                                               
     OR NOT ((IX.INDEXTYPE = 'P'                                      
           OR IX.INDEXTYPE = 'D')                                     
        AND TB.PARTKEYCOLNUM <> 0))                                  
   AND TS.DBNAME <> 'DSNDB06'                                         

;

It returns this style of output (If you have *no* output then congratulations you have *no* problem!):

REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE A140XO82
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB1  
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB3
.
.
.

Top of page

 

Cut and paste these control cards into some DB2 Utility JCL

Now you must cut and paste these control cards into some DB2 Utility JCL and get them all run.

The next query lists out all the index data you need from the above databases:


-- NOW LIST INDEX OBID'S ON THE ABOVE DATABASES THAT MUST BE SEARCHED
-- FOR IN THE REPAIR SYSOUT BY USING "F OBDINNUM" AND CHECKING IF THE
-- OBID'S MATCH                                                 
SELECT SUBSTR(TS.DBNAME , 1 , 8)       AS DB_NAME,                
       HEX(IX.OBID)                    AS INDEX_OBID,                
       SUBSTR(STRIP(IX.CREATOR) CONCAT '.' CONCAT                    
              STRIP(IX.NAME) , 1 , 71) AS INDEX                       
       FROM SYSIBM.SYSINDEXES    IX,                                      
       SYSIBM.SYSTABLES     TB,                                      
       SYSIBM.SYSTABLESPACE TS                                       
  WHERE TB.NAME    = IX.TBNAME                                       
    AND TB.CREATOR = IX.TBCREATOR                                    
    AND TB.TSNAME  = TS.NAME                                         
    AND TB.DBNAME  = TS.DBNAME                                       
    AND (TS.PARTITIONS > 254                                         
      OR TS.TYPE       = 'L'                                         
      OR TS.TYPE       = 'K'                                         
      OR TS.DSSIZE     > 0)                                          
    AND TS.PARTITIONS <> 0                                           
    AND (TS.STATUS = 'C'                                             
      OR NOT ((IX.INDEXTYPE = 'P'                                    
            OR IX.INDEXTYPE = 'D')                                   
          AND TB.PARTKEYCOLNUM <> 0))                                
    AND TS.DBNAME <> 'DSNDB06'                                       
  ORDER BY 1 , 2 
;

The output looks like:

---------+---------+---------+---------+----------
 DB_NAME   INDEX_OBID  INDEX 
---------+---------+---------+---------+---------
 A140XO82  000C        USER001.RESB~0
 ANKDB1    000A        ANKDB1.ANKIX11
 ANKDB3    0008        ANKDB3.ANKIX3 . . .

Do “F OBDINNUM” in the REPAIR job output

Now comes the really horrible bit – you must now do “F OBDINNUM” in the REPAIR job output and see if you have any matches for the above indexes in the relevant database. Again, if you have none then congratulations, you have no problem!

If you find data like this (just as in the IBM APAR description):

DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS

Top of page

 

then you must now do the next step to generate the corrective ALTER flip-flops:

-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- NOW GENERATE CORRECTIVE CLOSE FLIPS ONE FOR NO -> YES           
-- AND ONE FOR YES -> NO                                           
-- NO -> YES CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'N'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE YES $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE NO  $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;                                

Which generates the first set of flip-flops from YES -> NO:

ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE NO  $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE NO  $      
And then the next flip-flop:
-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- YES -> NO CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'Y'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE NO  $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE YES $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;
Which generates NO -> YES format:
ALTER INDEX                               
USER001.RESB~0                            
CLOSE NO  $                               
ALTER INDEX                               
USER001.RESB~0                            
CLOSE YES $  

 

Top of page

 

Note that the above ALTERs use a $ as terminator, so you must either use a C ALL $ ; style ISPF command or just use a –#SET TERMINATOR $ line in the SPUFI.

Automatic parse of the millions of lines of REPAIR output and *only* outputs OBIDs

At this point, when even I had a HUGE list of candidates, I decided to automate it even further so I wrote a little REXX that parses the millions of lines of REPAIR output and *only* outputs OBIDs for real bad guys. That way you can see what you really have to do very quickly. Here’s how my output looks:

Began checking at: 2014-02-11-12.34.08.752641
Database A140XO82 started.                   
Database A140XO82 ended.                     
Database ANKDB1   started.                   
Database ANKDB1   ended.                     
Database ANKDB3   started.                   
Database ANKDB3   ended.                     
Database D9999TOP started.                   
OBID X'0004' is X'0100' rebuilt X'1000'      
OBID X'0006' is X'0100' rebuilt X'1000'      
OBID X'0008' is X'0100' rebuilt X'1000'      
OBID X'000A' is X'0100' rebuilt X'1000'      
OBID X'000C' is X'0100' rebuilt X'1000'      
OBID X'000E' is X'0100' rebuilt X'1000'      
OBID X'0010' is X'0100' rebuilt X'1000'      
Database D9999TOP ended.                     
Database DATAPBG2 started.                   
Database DATAPBG2 ended.                     
.
.
.

Here you can see that my first actual problem database is the D9999TOP. When I then plug that into my flip flop generator by using cut-and-paste, for both cases, like this:

AND TB.DBNAME  = 'D9999TOP' 
  AND HEX(IX.OBID) IN (       
                  '0004'      
                , '0006'      
                , '0008'      
                , '000A'      
                , '000C'      
                , '000E'      
                , '0010'      
                 )   

I get the following output:

ALTER INDEX      
D9999TOP.BADBIGIA
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIA 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE NO  $

Top of page

Reach your real maximum number of pieces!

There was no output from the other query, so once these ALTERs ran my work on this database was done and I could happily move onto the next.

This is much quicker than searching through REPAIR sysout! The REXX should be copied to a PROC library and called NPI. Then using the JCL (which must tailored for your site, of course!) to execute it – and Bob’s your uncle!

Once all of the ALTERs have been executed then, finally, all of your NPIs can actually reach their real maximum number of pieces!
This month I also used (stole!) the neat Cartesian Join idea from Marcus Davage’s blog: http://spufidoo.wordpress.com/2013/02/28/judicious-cartesian-joins/

that was posted on Listserv and everyone loved it instantly!

 

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-05: Why SIZE matters for DB2

„Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

IFCIDs  : Instant monitoring of secondary allocations

DB2 V7 : Space management without „the IBM algorithm“
DB2 V8 : Running out of datasets & Degenerated extents
DB2 9    : UTS spaces introduction & PBG’s problem
DB2 10  : ALTER PBGs
DB2 11  : Remove any empty parts in PBGs

SAX       : Space management processing

Since DB2 V8, the DBAs of the world have all had the ability to forget about PQTY and SQTY in the DDL for Tablespaces and Indexes.

At first, nearly no-one trusted the sliding scale algorithm, and SOFTWARE ENGINEERING’s product Space AssuranceExpert (aka SAX) monitored and reacted instantly to secondary allocations.

However, we now have DB2 11 and I thought it would be interesting to review what was done in DB2 V7 (when our SAX was launched), and the difference nowadays in the DB2 11 world.

IFCIDs: Instant monitoring of secondary allocations

In DB2, every time a secondary allocation is done, DB2 can be made to spit out an IFCID. SAX  runs as a started task, active 24×7, from DB2 start up until just before DB2 shut down.

It catches all of these IFCIDs thrown by DB2 and performs an analysis with five basic questions:

  1. Can this dataset reach its maximum physical size *before* running out of physical extents? (The actual size is dependent on the “geometry” of the object of course!)
  2. Will this object run out of datasets? (The number of datasets an object can have is, once again, dependent on the “geometry” of the object)
  3. Is this partition nearing its maximum size?
  4. Did DB2 ask for one extent but got more back?
  5. Are any of my SMS disk storage pools running out of space?

DB2 V7 : Space management without „the IBM algorithm“

Remember, back in those old days of DB2 V7, we only had 255 extents and 254 partitions, but datasets could still get pretty big pretty fast.

The problem lots of shops had, was that an important dataset would “hit the buffers” of maximum number of extents *way* before it ever ran out of physical space. Thus causing grief, wailing and gnashing of teeth!

SAX stopped all this by giving WTO “heads-up” style messages in two flavors.

– First a warning message,
– and then a critical message.

This gave DBAs and space managers much needed time to plan for the outage and the, inevitably, long running REORG to actually action the required ALTER, or perhaps even any DROP/RECREATE that had to be done.

DB2 V8 : Running out of datasets & Degenerated extents

IBM also noticed this problem and so introduced in DB2 V8 the “sliding scale” of secondary allocations, as long as the ZPARM  OPTIMIZE EXTENT SIZING field (MGEXTSZ) was set to YES (this is the default from DB2 9, by the way).

Of course, to really use this, you then had to ALTER all of the existing spaces PQTY and SQTY to be -1, and then remember to delete all PRIQTY and SECQTY lines in your DDL and also rely on the TSQTY and IXQTY ZPARMs giving a big enough “first default”. (By the way, defaults for these two ZPARMS are 0, which is actually translated to be 720k or one cylinder for normal spaces and 7200k or 10 cylinders for LOB spaces). This all probably explains why the take up of this great feature has not been that spectacular and, in fact, Listserv *still* gets questions about “How good is this feature?”

This also explains why the primary reason for having SAX is still valid at most shops today!

Running out of datasets

Now the problem of running out of datasets is very, very evil indeed… For a non-partitioned space, you can have up to 32 datasets, and DB2 will happily allocate away and you will never know, or even be informed, if, and when, the last possible dataset has just been allocated and, of course, you will not know that the 33rd one cannot be allocated until you get a -904 unavailable resource error!

By definition this is “not good”, as you must do a HUGE REORG with a bunch of managers breathing down your neck and *not* make any mistakes with the new allocations. (Again, this is a very good reason to have SAX doing all the monitoring and triggering early warning “heads-up” style messages!)

A partition running out of space is rare, but when it does happen it is, of course, a disaster! The idea in SAX, is to warn when the end is near for any of the partitions in use and thus, as before, allow time for the ALTER etc.

Top of page

Degenerated extents

Degenerated extents are annoying as well. You have 255 extents, DB2 requests one but gets up to five back! This is “wasting” your precious supply of extents and so SAX can also warn you if this starts happening, and remedial action can again be planned to correct the problem, (normally a volume defrag in this case).

Now in z/OS 1.7 “Extent Constraint Removal” was introduced for the DATACLAS which, if set to “Y”, allows 7,257 extents but still limits you to 123 extents per volume and 59 volumes, so watch out if you are using huge “virtual” disks (E.g. MOD 54 or EAV), as you can end up wasting space as you still cannot exceed 123 extents per volume.

Finally, when an SMS Pool runs out of space, either for sort/work or image copy, it is *not* good! The idea here, is to also give a “heads-up” style alert so that the DBA can trigger the space management people to have a look at the state and size of the SMS storage groups this time alerted by percentage used or GBs of space free.

DB2 V8 introduced a big change – Partitions went up to a maximum of 4,096, and the calculation about how many pieces your NPI can have got “a little bit complex” (see also my previous newsletter: “Are you going to PIECEs” here!).

DB2 9: UTS spaces introduction & PBG’s problem

In DB2 9 the next major advance came with UTS spaces, and the one that caused the most grief was, of course, PBG. Why?

Well, the first problem was that some people went mad and used MAXPARTITIONS 4096 right from the get-go which they then found out could not simply be changed and ended up being a huge problem.

IBM came out with a bunch of fixes for these people, but the recommendation is still true today: “Use the number you expect to use!”

PBGs, however, came with a new set of space management Problems:

  1. By definition every partition is full, and so a TP REORG is “dangerous” -especially if you have VARCHAR, and even more so if compressed.
  2. ALTER at TP level is not supported for PBG.
  3. Getting rid of empty partitions was not supported.
  4. Adding partitions dynamically (by command) was not supported.
  5. What to do if the partition that is “in use” is growing and is
  6. The last allowed partition
  7. MAXPARTITIONS is set to one?

Now these are “non trivial” because the DB2 catalog is so defined and you would not want an alert every time someone created a table or index!
The trick here, is to treat these conditions as if it was a normal space and so, instead of warning that you are using the last part, it waits until you are using e.g. 80% of that part and then e.g. at 90% comes the critical threshold warning.

 

DB2 10: ALTER PBGs

With DB2 10 came the ability to ALTER PBGs to add parts which made using DSN1COPY to clone data around a lot better!

DB2 11: Remove any empty parts in PBGs

Further, in DB2 11, the REORG utility can be used to remove any empty parts in PBGs by the use of the ZPARM REORG_DROP_PBG_PARTS being set to ENABLE (DISABLE is the default).

Top of page

Space Management processing

The SAX tool way of processing all this info is neatly summarised in the help panel of the tool itself:

AUDIT SMS STOGROUPS
Should the Space AssuranceExpert audit SMS stogroups. Y/N
If Y is entered, a pop-up window will allow you to enter up
to 24 SMS storage groups which will be audited.
If WARN IF % ALLOC > or WARN IF GB FREE < is specified and
exceeded, a warning (WTO) will be issued.

AUDIT DEGENERATED XTS
Audit secondary quantity for de-generated extents. If
specified, a warning is issued in case of the last extent
does not reach this percentage of the SECQTY specified
in the DB2 catalog. If this field is left blank, no
auditing is performed.

SUPERVISE LPS
Supervise linear pagesets. If specified, a warning is issued
in case of high allocated reaches this percentage of the
maximum data set size for partitioned objects.

For non-partitioned objects, a warning is issued for every
newly allocated data set as soon as the data set number
reaches this percentage of the maximum number of data sets:

Two different values may be entered for warning and critical
values with different message ids. This may be useful for
automation reasons (see below).

Object type: TABLESPACE      ! Maximum number of data sets
-----------------------------+----------------------------
LOB tablespaces              ! 254
-----------------------------+----------------------------
Non-partitioned tablespaces  ! 32
-----------------------------+----------------------------
Partitioned tablespaces      ! 1 (Percent used check)
-----------------------------+----------------------------
Partitioned By Growth        ! MAXPARTITIONS. LPS check if
tablespaces                  ! more than one. If on last
                             ! partition then percent used.
-----------------------------+----------------------------

Object type: INDEX           ! Maximum number of data sets
-----------------------------+----------------------------
Non-partitioned indexes on   ! MIN ( 4096 , 2 power 32 /
tablespace with LARGE,       ! ( DSSIZE / TS PGSIZE))
DSSIZE, or more than 64      ! Eg: 128 GB DSSIZE with
Partitions                   ! 8 KB Tablespace Page
                             ! gives 256 Pieces (datasets)
                             ! Or 4 GB DSSIZE with
                             ! 4 KB Tablespace Page
                             ! gives 4096 Pieces (datasets)
-----------------------------+----------------------------
Non-partitioned indexes      ! 32
otherwise                    !
-----------------------------+----------------------------
Partitioned indexes          ! 1 (Percent used check)
-----------------------------+----------------------------

To support automation based on WTO ids two different
thresholds may be specified:
Field (1) specifies a warning threshold using WTO ids
O2RTSU04 - 12W (non-partitioned spaces)
O2RTSU04 - 14W (partitioned spaces)
O2RTSU04 - 16W (partition by growth spaces)
Field (2) specifies a critical threshold using WTO ids
O2RTSU04 - 13W (non-partitioned spaces)
O2RTSU04 - 15W (partitioned spaces)
O2RTSU04 - 17W (partition by growth spaces)

So now you know why size matters for DB2!

The big question now is:
“Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

Top of page

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2012-12: APAR list reprint „Back by popular demand“

A few years ago,
I started keeping a little list of APARs that documented the “interesting” problems in DB2,
so that I did not have to go through hours of searching every month or two.

Over time it evolved into three distinct groups of corrections.

  • RTS:

For any fixes relevant to Real-Time Statistics (As one of our prime products RealTimeDatabaseExpert relies on this data it behooves me to make sure it is as valid and good as can be!)

  • RUNSTATS:

As the age old adage “garbage in = garbage out” is still correct when it comes to the DB2 Optimizer. Over the years, it has become very clever and at the same time very sensitive to dodgy data or correlations or, heaven forbid, manual well-minded but incorrect updates (To help here : we even wrote our own freeware tool Statistics HealthCheck to find the bad data – before it bites you!)

  • SQL Performance group:

Here all of the tweaks and bugs of the Optimizer usually surface, but I have had to expand its remit over the years to now also include SQL Access Path and SQL Stats problems as these groups then cover all possible interesting things!

 

APAR list what’s new?

I have also changed how I store the data as it used to be in a PDS member for DB2 V6 and is now in three Excel spreadsheets for DB2 V8, 9 and 10. Soon I will drop V8 and add Vnext as V8 has now “gone” from the APAR point of view. I have also added useful columns like “last modified” so you can see if the APAR has moved since you last blinked, and the latest addition is a HIPER column. It used to be part of the description but could easily get lost and so now there is a Y for HIPER or Blank for normal.

So every month I trawl through the tech support database of DB2 z/OS and look for changed, updated, new, or closed APARs and then get all the data, pack it into my Excels and then mail it out to anyone who wants it. Now I know you can subscribe to lots of services that send out email alerts but I still find three Excels or comma separated files a *lot* easier to work with than an email and it is DB2 version self-documenting. No need to follow links to the bitter end of the IBM web site!

As a final thought on this topic, I also write some notes about which APARs I think are of real interest including HIPERs etc.

 

Get the three APAR lists:

So here is how the last set of data looked. The columns are always the same:

APAR, CLOSED date, STATUS, V8 PTF, 9 PTF, 10 PTF, HIPER, Description

The Status column can be blank (nothing to report), New, Closed, New & Closed (If IBM is quick!), Modified (If a closed APAR has been changed by IBM) or Fixed in Next (that you do not want to see!)

The PTF columns are either the PTF number, N/A for not applicable for this release or OPEN for, guess what?, open.

 

First is the RTS list:

2012-12_RTS list

Now the RUNSTATS list:

2012-12_RUNSTATS list

 

Here you can see one HIPER – and note that any typos come from IBM not me J(INDFINITELY…).

 

And finally the SQL Performance list:

2012-12_SQL Performance

 

If you are interested in getting these updates simply email me or techsupport@seg.de and ask!

As usual comments and queries are always welcome!

 

TTFN,
Roy Boxwell
Senior Architect