2014-01: Complementing IBMs ACCESS PLAN Stability 1, for Dynamic SQL

 

In collaborazione con Expertise4IT

Con il package di salvataggio globale delle RUNSTATS implementato da SEG: Per quale ragione questa buona idea in qualche caso fallisce?

 

• La storia
• Cos’è accaduto realmente?
• Come risolvere il problema?
• Rescue Steps
• Un esempio (comprensivo di pannelli JCL)
• Il prossimo mese
Questo mese vi voglio raccontare una storia. La storia è vera, ma I nomi sono stati cambiati per proteggere gli innocenti

 

 

La storia

Una grande azienda schedula regolarmente la fasatura dei processi di produzione nella notte di Giovedì. Lo scorso anno, una notte tutto procedeva regolarmente ma … al Venerdì mattina …….

…i telefoni cominciarono a squillare perchè gli utenti si lamentavano di tempi di risposta lunghi o nulli (timeout) in un’applicazione piuttosto critica per il business.

Il problema rapidamente evolve dai normali ‘spegnete e riavviate’ e ‘avete cambiato qualcosa?’ ai senior managers che chiedono per quale motivo le cose non funzionano più.

Allo stato il Gruppo DBA non era stato ancora coinvolto, dal momento che il primo pensiero era che si trattasse di un “bad” package passato in produzione accidentalmente nella notte.

  • Il Gruppo di gestione della produzione, allora, riportava indietro I packages in gioco, ma l’operazione non produceva effetti….
  • I ritardi di consegna diventavano peggiori. Vengono quindi ‘stoppati’ quasi tutti i server WebSphere, almeno per permettere che un pò di lavoro venisse eseguito sul sistema sovraccarico. Essendo ormai giunti al livello di panico, i DBA and il team JAVA vengono coinvolti.
  • E questi trovano rapidamente il colpevole; si trattava di un SQL dinamico molto complesso che si era ‘comportato’ correttamente e con efficienza fino ad un certo istante della notte di Giovedì, e adesso performava in maniera disastrosa.
  •  I DBA riorganizzano le tabelle più consistenti fra quelle in gioco, nella speranza che le cose vadano meglio, ma, purtroppo, non ci sono risultati apprezzabili….

    Finalmente il DBA propone la creazione di un indice con relative RUNSTATS in produzione.

L’access path si modifica sostanzialmente e le performance tornano ad essere buone.
I server WebSphere vengono restartati e, gradualmente, si torna alla normalità.
Tutto questo processo é durato 2 giorni!
L’azienda deve rifasare la sua catena logistica e le consegne Just-In-Time, quindi il blocco ha avuto qualche seria ripercussione, ovviamente….  E questa è tutta la storia

 

Cosa è realmente avvenuto?

Il DBA, investigando sulle possibili cause del blocco, ha scoperto che ciò che realmente era accaduto, era che era andata in macchina una RUNSTATS solo su una piccola tabella che, fatta girare durante la notte di Giovedì in un momento inopportuno, aveva determinato un passaggio dell’access path allo stato di “pear-shaped”(aka Belly Up) per tutta la giornata di Venerdì e per metà del Sabato…

Come risolvere rapidamente e facilmente il problema?

Il DBA pensò ai modi in cui il problema, che avrebbe potuto riproporsi in futuro, potesse essere risolto in maniera facile e rapida. E qui comincia la mia parte della storia…

Questa azienda utilizza software proprio e possiede una licenza d’uso della componente Enterprise Statistics Distribution (ESD) del Bind ImpactExpert, che estrae, e opzionalmente converte, tutte i dati di catalogo di cui ha bisogno l’ottimizzatore per fare il proprio lavoro. Normalmente, I clienti utilizzano questo tool per copiare le statistiche di produzione in stile ‘sand box’, per vedere se un APAR o una migrazione di DB2 può causare problemi imprevisti. Per questo motivo, essi usano la componente Early-PreCheck del nostro tool Bind ImpactExpert per SQL statico e dinamico. Adesso facciamo un altro scenario, chiamato DSC (Dynamic Statement Cache) Protection, che dovrebbe realizzare quello che desidera il cliente, ma fa anche molto altro e, ovviamente, ha costi più elevati!

Ed è così che nacque l’idea di un tool ‘tascabile’ che abbiamo chiamato RUNSTATS Rescue. Ho sentito che chiedevate “Perchè si chiama PocketTool?”; la risposta è: “Perchè costa quanto il denaro che si tiene in tasca!” (aka Pin Money negliUSA o una “allowance” se preferite). Questi tools sono realmente molto economici!

Adesso, prima che smettiate di leggere questa newsletter e cominciate a lamentarvi del fatto che si tratti di una pura attività di marketing, per favore mettetevi in testa che quanto io descrivo in questo documento potrebbe anche essere stato scritto da voi e, quindi, avete solo bisogno di darmi credito sufficiente per condividere l’idea…

 

RUNSTATS Rescue

L’idea è usare l’EXPLAIN in qualunque modalità, shape or form, sia in SPUFI, che direttamente all’interno di qualsiasi monitor, semplicemente per ‘EXPLAINare’ l’SQL critico, e tenere a mente quale PLAN_TABLE owner state usando e, anche, il QUERYNO che avete appena usato.

Usando questi due inputs, RUNSTATS Rescue analizza l’output dell’ EXPLAIN per costruire una lista di control cards di estrazione e update per il nostro tool ESD, per tutte le tabelle usate e per I relative indici –anche se, ovviamente, non in uso! Infine viene anche generato un flusso di RUNSTATS DSC per tutti I tablespaces coinvolti nella query al fine di avere la certezza che la prossima volta che lo statement critico andrà in macchina, userà le corrette statistiche.

Ora, ovviamente, sorge la domanda: “Come faccio a sapere quail sono le statistiche da usare come Rescue statistics?”

La risposta è: “Quelle che c’erano prima prima che venisse eseguita una REORG con inline statistics o una RUNSTATS “. Questo è il punto chiave da tenere a mente: dovete semplicemente eseguire lo ESD extract prima che venga eseguita qualunque operazione di maintenance sul Database.
Molti siti hanno giorni, o weekends, in cui girano le procedure di maintenance, e non è certo un problema estrarre I dati e salvarli, per esempio,in un GENGROUP, al fine di facilitare l’individuazione di data e time in cui le statistiche erano ‘buone’ e poi, con il job di RUNSTATS Rescue, ripristinare molto rapidamente le statistiche necessarie. Questo consente al gruppo DBA di avere più tempo per capire cos’è realmente successo e apportare le giuste correzioni, quantomeno rimanendo a proprio agio.

10 Rescue Steps

  • 1. Selezionare il nuovo scenario di RUNSTATS Rescue
  • 2. Generare JCL
  • 3. Opzionalmente copiarle in GENGROUP dataset
  • 4. Inserire EXPLAIN TABLE-CREATOR e QUERYNO
  • 5. Lancio automatico del nostro browser di catalogo
  • 6. Drill down fino al livello degli indici
  • 7. Chiedere un “new” file name per le “rescue” statistics estratte
  • 8. Eseguire l’estrazione delle RUNSTATS Rescue
  • 9. Resettare le statistiche ed eseguire le RUNSTATS
  • 10. Le “Rescued” Statistics

 

Di seguito un esempio guidato di cosa avviene in realtà:

 

In basso nella schermata, si può osservare il nuovo scenario delle  RUNSTATS Rescue – Selezionandolo si attiva una finestrella con tre passi. Il primo deve essere eseguito solo una volta e va semplicemente inserito in un job produttivo esistente. Si raccomanda di farne il primo job della normale catena di maintenance del DB2 Database Maintenance.

1 – Select the new scenario RUNSTATS Rescue

 

2 – Generate some JCL

La prima opzione genera alcuni JCL come sotto illustrato:

 

3 – Opzionalmente copiare in un GENGROUP dataset

Esempio di step finale di copia su GENGROUP

 

4 – Insert the EXPLAIN TABLE-CREATOR and QUERYNO

Selezionare il secondo passo per preparare le RUNSTATS Rescue

5 – Lancio automatico della scansione di catalogo

Premendo “enter” si lancia il browser sul catalogo per consentire la vision degli oggetti usati dallo SQL…..

 

6 – Drill down to the Index level

…con I relativi indici, ovviamente!

 

7 – Ask “new” file name for the extracted “rescue” statistics

Uscendo con PF3 il tool chiede il dsname delle Production Statistics originali, come estratte dal job del primo passo, e un  “new” file name per le rescue statistics estratte:

 

 

8 – Perform the RUNSTATS Rescue extraction

Il JCL che esegue l’estrazione delle RUNSTATS Rescue, incluso il passo opzionale su saving su GDG, si presenta come sotto:

 

9 – Reset the statistics and executes the RUNSTATS

Infine, viene selezionato il terzo step, che azzera le statistiche ed esegue la RUNSTATS che provvede al flush della DSC

 

10 – The “Rescued” Statistics

Adesso, la prossima volta che lo statement apparirà, userà le “rescued” statistics e riprodurrà il vecchio Access Path.

 

Nel prossimo mese

Nel prossimo mese desidero espandere la problematica trattata con la possibilità di estenderne la soluzione allo SQL statico.

Il mese successive, scenderò nei dettagli della ‘DSC Protection scenario’ che ho citato prima. Non si tratta di un tool tascabile, ovviamente, ma è molto interessante!

Come sempre questions or comments sono benvenuti,

TTFN Roy Boxwell
Senior Software Architect

 

 

2015-03: DB2 z/OS object changes: Quiet Times for maintenance

Do you have an idea when tables are in use?

 

Ahhh! Wouldn’t it be great if we all had just quiet times? Sadly we never have time for anything these days, let alone for peace and quiet!

The quiet before the Storm?

What I mean by Quiet Times is, however, different: it is the time when a given table, or set of tables, is not in use. This is very interesting to find out, especially when you are doing data definition changes (DDL). For example: you are given the task of adding some columns to some tables – naturally these days you have no idea who or what is actually using the tables, and absolutely no idea *when* they are being used.

What do you do?

Well, all you can do is schedule the change for early one morning and then quickly push the ALTERs and the REORGs through – hoping not to collide with any users of the data.

 

Guessing when tables are in use can be dangerous

This is all a bit haphazard and dangerous! Wouldn’t it be better if you could look at a calendar and see that this table is only used Mo – Th from 09:00 – 16:00 thus giving you a really big hint that Friday morning is a better bet?

 

Capture your DB2 SQL Workload & project the results into a Calendar view

Using the new and enhanced IFCIDs in DB2 10 you can now do this! Capture your workload and analyze when table(s) are being used and project the results into a Calendar view:

News from the labs Newsletter 2015-03: Quiet Times

 

Gives this style Output:

News from the labs Newsletter 2015-03: Quiet Times

 

Handy huh?

Video (3 min.)  – Presentation

– You can drag the dates back and forth to validate the assumptions of a period of time, and then you can happily do your ALTERs and REORGs during the day.

– Apart from not having to get up early, the added bonus is that you get to learn more about who uses the tables!

Of course this system is *not* a crystal ball! It is just showing historical usage. Who knows what the future holds?

Would this style of output be useful for you? Could you imagine this helping you in your day-to-day tasks?

 

As usual any queries or criticism gladly accepted!

TTFN,

Roy Boxwell

2015-01 BIFCIDS – Where’s the BIF?

How will you deal with loop-hole usage in production code?

 

 

The IFCIDs 366 and 376

DB2 provides many and varied IFCIDs. But for today, I’m most interested in the 366 and 376. The 366 is available in DB2 10 and the 376 in DB2 11. Now I like to call these “BIFCIDs” because they are triggered whenever a BIF is used that will behave differently than it is currently used when moving to the next release of DB2. (It’s also triggered when changing Application Compatibility settings in DB2 11 and higher).

 

So where’s the BIF?

BIF Usage Video (11min:)       Presentation

Well, a BIF is a Built-In Function such as CHAR, DECIMAL, etc. There are hundreds of them these days. In the last few DB2 releases, IBM has changed a few to make DB2 more compatible with SQL standards. They have actually closed a couple of loop-holes, where “bad” data could be accepted and processed.

 

 

Loop-hole user?

What happens is: someone somewhere found this loop-hole and used it in production code. Now when you upgrade your DB2, this code will either fail or give erroneous results – which is never good. Hence IBM created the IFCID 366. This is output every time an SQL statement is PREPARED, or executed, that contains a candidate BIF. There were so many of these, that IBM introduced a sort of condensed version so it only triggered one for the first execution, or prepare, but sadly that IFCID—376—is only for DB2 11.

Where can, or will, this really hurt?

 

Looking into the documentation for these IFCIDs you will see a long list of when they are written:

***********************************************************************
**  IFCID 0366 is a serviceability trace.                            **
**  It can be used to identify applications that are affected        **
**  by incompatible changes.                                         **
**  The QW0366FN field indicates the type of incompatible Change:    **
**                                                                   **                                                      
**  QW0366FN = 1                                                     **
**  Indicates that the pre Version 10 CHAR built-in function has     **
**  been invoked. There is an incompatible change to the output of   **
**  the CHAR function for some decimal data. The zparm               **
**  BIF_COMPATIBILITY and/or the SYSCOMPAT_V9 schema have been used  **
**  by this application to get the old behavior. Please make the     **
**  appropriate changes and rebind with the SYSCURRENT schema to     **
**  use the Version 10 CHAR(decimal) built-in function.              **
**  (PM29124 V10 only, usermod V8/V9)                                **
**                                                                   **
**  QW0366FN = 2                                                     **
**  Indicates that the pre Version 10 VARCHAR built-in function or   **
**  CAST(decimal AS CHAR or VARCHAR) has been invoked.               **
**                                                                   **
**  QW0366FN = 3                                                     ** 
**  Indicates that an unsupported character representation of a      ** 
**  timestamp string was used. PM48741 V10 only.                     **
**                                                                   ** 
**  QW0366FN = 4                                                     ** 
**  A QW0366FN 4 record indicates that the statement uses the        **     
**  word ARRAY_EXISTS as an unqualified user-defined function Name   **   
**  in a context that may be incompatible with Version 11.           **     
**                                                                   **
**  QW0366FN = 5                                                     ** 
**  A QW0366FN 5 record indicates that the statement uses the        **
**  word CUBE as an unqualified user-defined function Name           **
**  in a context that may be incompatible with Version 11.           **
**                                                                   **
**  QW0366FN = 6                                                     **
**  A QW0366FN 6 record indicates that the statement uses the        **
**  word ROLLUP as an unqualified user-defined function Name         **
**  in a context that may be incompatible with Version 11.           **
**                                                                   ** 
**  QW0366FN = 7                                                     **
**  A QW0366FN 7 record indicates that DB2 for z/OS server issued    **
**  a SQLCODE -301 for incompatible data type conversion from        **
**  string data type (e.g. CHAR, VARCHAR, GRAPHIC, VARGRAPHIC        **
**  etc.) to numeric data type in V10 CM mode when implicit          **
**  cast is not supported or V10 NFM mode when DDF_COMPATIBILITY     **
**  zparm is set to DISABLE_IMPCAST_NJV or SP_PARMS_NJV to           **
**  disable implicit cast, and the client is CLI Driver              **
**  or v11 NFM mode & APPLCOMPAT = V10R1 when DDF_COMPATIBILITY      **
**  is set to SP_PARMS_NJV or DISABLE_IMPCAST_NJV to disable         **
**  implicit cast either from string data type to numeric or         ** 
**  from numeric data type to string data type.                      **
**                                                                   **
**  QW0366FN = 8                                                     **
**  A QW0366FN 8 record indicates that DB2 for z/OS server           **
**  returned output data match the data types of the                 **
**  corresponding CALL statement arguments when DDF_COMPATIBILITY    **
**  zparm is set to SP_PARMS_NJV.                                    **
**                                                                   **
**  QW0366FN = 9                                                     **
**  A QW0366FN 9 record indicates a data type conversion from        **
**  a TIMESTAMP WITH TIME ZONE input to a TIMESTAMP data             **
**  during input host variable bind-in process on server when        **
**  DDF_COMPATIBILITY zparm is set to IGNORE_TZ to ignore the        **
**  time zone information sent by Java IBM Data Server Driver.       **
**                                                                   **
**  QW0366FN = 10                                                    ** 
**  RTRIM, LTRIM or STRIP version 9 being used with mixed data       **
**                                                                   **
**  QW0366FN = 1101                                                  ** 
**  Indicates that the INSERT statement that inserts into an XML     **
**  column without XMLDOCUMENT function has been processed (which    **
**  should result in SQLCODE -20345 when run on DB2 release prior    **
**  to V11). Starting with V11, SQL error will no longer be issued.  **
**  Application will no longer recieve SQLCODE for this Statement.   **
**                                                                   ** 
**  QW0366FN = 1102                                                  **
**  Indicates that V10 XPath evaluation behavior was in effect which **
**  resulted in an error. For instance, a data type conversion error **
**  could have occured for a predicate that would otherwise be       **
**  evaluated to false. Starting from V11, such "irrelevant" Errors  **
**  might be suppressed so an application might no longer recieve    **
**  the SQLCODE for this Statement.                                  **
**                                                                   **
**  QW0366FN = 1103                                                  **
**  Indicates that a dynamic SQL uses the ASUTime limit that has     **
**  been set for the entire thread for RLF reactive governing.       **
**  For instance, when a dynamic SQL is processed from package A,    **
** if the ASUTime limit is already set during other dynamic SQL      ** 
** processing from package B in the same thread, the SQL from        **
** package A will use the ASUTime limit set during the SQL           **
** processing from package B. Stating with v11, dynamic SQLs from    **
** multiple packages will use the ASUTime limit that is set          **
** considering its own package information.                          **
**                                                                   **
** QW0366FN = 1104, 1105, 1106, 1107                                 **
** Indicates that CLIENT special register (CLIENT_USERID,            **
** CLIENT_WRKSTNNAME, CLIENT_APPLNAME, CLIENT_ACCTNG) has been set   **
** to a value that is longer than what is supported prior to V11.    **
** A shorter value has been used instead.                            **
**                                                                   **
** QW0366FN = 1108                                                   **
** Indicates that CLIENT special register (CLIENT_USERID,            **
** CLIENT_WRKSTNNAME, CLIENT_APPLNAME, CLIENT_ACCTNG) has been set   **
** to a value that is longer than what is supported prior to V11.    **
** Truncated values upto the supported lengths prior to v11 have     **
** been used for RLF table search instead.                           **
**                                                                   **
** QW0366FN = 1109                                                   **
** Indicates that CAST(string AS TIMESTAMP) was processed for the    **
** input string of length 8 and input was treated as a store clock   **
** value (or input string was of length 13 and was treated as a      **
** GENERATE_UNIQUE value). This behavior is incorrect for a CAST     **
** and is valid for TIMESTAMP built-in function only. This behavior  **
** is being corrected in DB2 11 so that input to CAST is not         **
** treated as a store clock value nor GENERATE_UNIQUE.               **
**                                                                   **
** QW0366FN = 1110                                                   **
** Indicates the integer argument of SPACE function is greater       **
** than 32764.                                                       **
**                                                                   **
** QW0366FN = 1111                                                   **
** Indicates the optional integer argument of VARCHAR function       **
** has a value greater than 32764. *                                 **
***********************************************************************

 

Useful stuff indeed!

Phew! Not a bad list, huh? Now you see why these IFCIDs are so useful. It could well be, that you have none of these “alive” in your system today. Or, of course, it could be that you get millions of the things! Somehow you will have to work out a way to save the data, analyse it to get to the root cause, and then, finally, fix the problem(s).

 

Saved by APPLCOMPAT?

You could argue that the new DB2 11 parameter Application Compatibility will save you, but this is really a false economy. All it enables is the guarantee that the code will still “run”. However, in two more DB2 releases the code will fail and, in two more releases – so about six years – who will even know *how* to change which piece of source code and, perhaps even, where is that source code?

 

Saved by BIFCIDs

Personally, what I would do, is : to run our SQL WorkloadExpert tool to trap all the required [B]IFCIDs for a few hours (at first!).Then I would analyse the results, fix the code where it needs fixing – and repeat! I would keep doing this until no IFCID records are coming out and I would be set!

 

What is even better, is that our SQL WorkloadExpert will work correctly even when any new QW0366FN values appear – so when IBM decides to add another code (Like the new values 9 and 10 above for example) this BIF Usage still works correctly.

 

Of course, you may have another tool that you use at your site.

Can it see “Where’s the BIF?”

How will you deal with loop-hole usage in production code?

 

As usual, any question or comments gladly welcome!

 

TTFN

Roy Boxwell

 

 

2015-02: DB2 z/OS AUDIT – Boring Boring Boring

 

New IFCIDs 316 – 400/401 for DB2 z/OS Audit “on the fly”

I use the Monty Python title as it reminded me of an old telephone book joke in England: look up „boring“ in the Yellow pages – there you would find a little bit of text that simply said „See Civil Engineering.“ It still makes me laugh these days.  Anyway, back to the newsletter…

Auditing is often looked down upon as being boring, tedious and of no worth. This is rubbish of course! Without auditing we would not be allowed to do anything these days. I hardly know any DBAs who still have SYSADM.

It is just too darn powerful!

The statistics also tell us that nearly all “hack” attacks are “inside jobs” from the very people we know and trust. The statistics are also pretty brutal on the Mainframe/Server divide – Mainframes are very rarely successfully attacked, but Servers (see Sony et al.) are all the time. I think there should be a lot more auditing on the little iron really!

 

So how do you audit on the host side of the street?

Do you actively check what is happening? Or do you just wait for the thought police to arrive?
 


 

Enabling Auditing from DB2 10

When the enhanced IFCID 316 and the new IFCIDs 400/401 were introduced, it closed a gap in the ability to actually Audit your system “on the fly”. Using these IFCIDs  you could actually trap/monitor/audit all of the SQL running in your Plex. These IFCIDs are also nearly free as the overhead is “background noise” levels of CPU.

Now, let us imagine that you are capturing all of this data. That you are regularly snapping both the DSC and the SSC, (that’s what I call the EDMPOOL cache for Static SQL statements), that this data is all being rolled up and saved into a DB2 Data warehouse, and that you are triggering Batch jobs to analyse for Audit – reporting using various queries just to see if anything “untoward” is starting/or is happening!

 

What queries would you want to run?

– I have a few straight off the bat here: Who is reading from the Payroll table?
– Who is updating the Payroll or Employee tables?
– Who is accessing *any* table from the internet?
– Is anybody being really clever and using ODBC to select from my production tables?
– How many userids are out there using my data?
– Has any SYSADM enabled userid done any work on my system today?

All good Audit questions that you could put into operation very simply indeed!

 

Oh Lucky Man!

Now as luck would have it, we have a software product called SQL WorkloadExpert that actually does all this for you! What’s more you can expand it as much as you like! Cool huh?

– Who is accessing *any* table from the Internet?…
– Has any SYSADM enabled userid done any work on my system today?…
– How many userids are out there using my data?…

Audit Video (5 min.)   –  Presentation

Looking at this screen shot you can get an idea of the possibilities – Look at the Workstation name column for instance. “192.xxx” is the intranet. If any other tcp/ip address showed up here, it would be, shall we say, “worrying”. You can also see great stuff like “EXCEL.EXE” in the Transaction name column, and that a certain Mr. Boxwell has been running a few things from lots of data sources, I wonder what he’s up to?

News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Here you can see a nice list of “Intents” against a given table (in this case SYSIBM.SYSTABLES)
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Now you can see who did what type of insert against a given object.
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

This is a list of *all* Primary Authorization IDs or Collections and Packages that have run
– Any intruders?
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

This is a list of all User Data Updates done by users with SYSADM authority in the last workload.
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Triggering the Auditor

All of this with drill down to the actual SQL that was executed. Cool stuff and very handy indeed! But this is “past the point” and what you really need is a Batch Style interface that runs the SQLs and triggers alarms on the host before someone looks at the pretty GUI!

I would do this with a nice little set of batch Spufi’s that get post processed and either e-mailed directly to the Auditors, or WTO’d  as an alarm action that then triggers a batch job to do something else.

 

What would you like to Audit? Or what would your Auditors like to see? How do you currently accomplish this?

I would be fascinated to hear from you!

TTFN,

Roy Boxwell

 

2015-02: DB2 AUDIT – Boring Boring Boring

New IFCIDs 316 – 400/401 for DB2 z/OS Audit “on the fly”

I use the Monty Python title as it reminded me of an old telephone book joke in England: look up „boring“ in the Yellow pages – there you would find a little bit of text that simply said „See Civil Engineering.“ It still makes me laugh these days.  Anyway, back to the newsletter…

Auditing is often looked down upon as being boring, tedious and of no worth. This is rubbish of course! Without auditing we would not be allowed to do anything these days. I hardly know any DBAs who still have SYSADM. It is just too darn powerful! The statistics also tell us that nearly all “hack” attacks are “inside jobs” from the very people we know and trust. The statistics are also pretty brutal on the Mainframe/Server divide – Mainframes are very rarely successfully attacked, but Servers (see Sony et al.) are all the time. I think there should be a lot more auditing on the little iron really!

 

So how do you audit on the host side of the street?

Do you actively check what is happening? Or do you just wait for the thought police to arrive?

 

Enabling Auditing from DB2 10

When the enhanced IFCID 316 and the new IFCIDs 400/401 were introduced, it closed a gap in the ability to actually Audit your system “on the fly”. Using these IFCIDs  you could actually trap/monitor/audit all of the SQL running in your Plex. These IFCIDs are also nearly free as the overhead is “background noise” levels of CPU.

Now, let us imagine that you are capturing all of this data. That you are regularly snapping both the DSC and the SSC, (that’s what I call the EDMPOOL cache for Static SQL statements), that this data is all being rolled up and saved into a DB2 Data warehouse, and that you are triggering Batch jobs to analyse for Audit – reporting using various queries just to see if anything “untoward” is starting/or is happening!

 

What queries would you want to run?

– I have a few straight off the bat here: Who is reading from the Payroll table?
– Who is updating the Payroll or Employee tables?
– Who is accessing *any* table from the internet?
– Is anybody being really clever and using ODBC to select from my production tables?
– How many userids are out there using my data?
– Has any SYSADM enabled userid done any work on my system today?

All good Audit questions that you could put into operation very simply indeed!

 

Oh Lucky Man!

Now as luck would have it, we have a software product called SQL WorkloadExpert that actually does all this for you! What’s more you can expand it as much as you like! Cool huh?
Audit Video (5 min.)     Presentation

Who is accessing *any* table from the Internet?…
Has any SYSADM enabled userid done any work on my system today?…
How many userids are out there using my data?…

 

 

Looking at this screen shot you can get an idea of the possibilities – Look at the Workstation name column for instance. “192.xxx” is the intranet. If any other tcp/ip address showed up here, it would be, shall we say, “worrying”. You can also see great stuff like “EXCEL.EXE” in the Transaction name column, and that a certain Mr. Boxwell has been running a few things from lots of data sources, I wonder what he’s up to?

News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Here you can see a nice list of “Intents” against a given table (in this case SYSIBM.SYSTABLES)
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Now you can see who did what type of insert against a given object.
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

This is a list of *all* Primary Authorization IDs or Collections and Packages that have run
– Any intruders?
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

This is a list of all User Data Updates done by users with SYSADM authority in the last workload.
News from the labs Newsletter 2015-02: DB2 AUDIT-Boring, Boring, Boring-Screenshot1

 

Triggering the Auditor

All of this with drill down to the actual SQL that was executed. Cool stuff and very handy indeed! But this is “past the point” and what you really need is a Batch Style interface that runs the SQLs and triggers alarms on the host before someone looks at the pretty GUI!

I would do this with a nice little set of batch Spufi’s that get post processed and either e-mailed directly to the Auditors, or WTO’d  as an alarm action that then triggers a batch job to do something else.

 

What would you like to Audit? Or what would your Auditors like to see? How do you currently accomplish this?

Audit Video (5 min.)     Presentation

 

I would be fascinated to hear from you!

TTFN,

Roy Boxwell

 

2015-01 BIFCIDS – Where’s the BIF?

How will you deal with loop-hole usage in production code?

 

The IFCIDs 366 and 376

DB2 provides many and varied IFCIDs. But for today, I’m most interested in the 366 and 376. The 366 is available in DB2 10 and the 376 in DB2 11. Now I like to call these “BIFCIDs” because they are triggered whenever a BIF is used that will behave differently than it is currently used when moving to the next release of DB2. (It’s also triggered when changing Application Compatibility settings in DB2 11 and higher).

 

So where’s the BIF?

BIF Usage Video (11min:)       BIF Presentation

Well, a BIF is a Built-In Function such as CHAR, DECIMAL, etc. There are hundreds of them these days. In the last few DB2 releases, IBM has changed a few to make DB2 more compatible with SQL standards. They have actually closed a couple of loop-holes, where “bad” data could be accepted and processed.

BIF usage

Loop-hole user?

What happens is: someone somewhere found this loop-hole and used it in production code. Now when you upgrade your DB2, this code will either fail or give erroneous results – which is never good. Hence IBM created the IFCID 366. This is output every time an SQL statement is PREPARED, or executed, that contains a candidate BIF. There were so many of these, that IBM introduced a sort of condensed version so it only triggered one for the first execution, or prepare, but sadly that IFCID—376—is only for DB2 11.

Where can, or will, this really hurt?

 

Looking into the documentation for these IFCIDs you will see a long list of when they are written:

***********************************************************************
**  IFCID 0366 is a serviceability trace.                            **
**  It can be used to identify applications that are affected        **
**  by incompatible changes.                                         **
**  The QW0366FN field indicates the type of incompatible Change:    **
**                                                                   **                                                      
**  QW0366FN = 1                                                     **
**  Indicates that the pre Version 10 CHAR built-in function has     **
**  been invoked. There is an incompatible change to the output of   **
**  the CHAR function for some decimal data. The zparm               **
**  BIF_COMPATIBILITY and/or the SYSCOMPAT_V9 schema have been used  **
**  by this application to get the old behavior. Please make the     **
**  appropriate changes and rebind with the SYSCURRENT schema to     **
**  use the Version 10 CHAR(decimal) built-in function.              **
**  (PM29124 V10 only, usermod V8/V9)                                **
**                                                                   **
**  QW0366FN = 2                                                     **
**  Indicates that the pre Version 10 VARCHAR built-in function or   **
**  CAST(decimal AS CHAR or VARCHAR) has been invoked.               **
**                                                                   **
**  QW0366FN = 3                                                     ** 
**  Indicates that an unsupported character representation of a      ** 
**  timestamp string was used. PM48741 V10 only.                     **
**                                                                   ** 
**  QW0366FN = 4                                                     ** 
**  A QW0366FN 4 record indicates that the statement uses the        **     
**  word ARRAY_EXISTS as an unqualified user-defined function Name   **   
**  in a context that may be incompatible with Version 11.           **     
**                                                                   **
**  QW0366FN = 5                                                     ** 
**  A QW0366FN 5 record indicates that the statement uses the        **
**  word CUBE as an unqualified user-defined function Name           **
**  in a context that may be incompatible with Version 11.           **
**                                                                   **
**  QW0366FN = 6                                                     **
**  A QW0366FN 6 record indicates that the statement uses the        **
**  word ROLLUP as an unqualified user-defined function Name         **
**  in a context that may be incompatible with Version 11.           **
**                                                                   ** 
**  QW0366FN = 7                                                     **
**  A QW0366FN 7 record indicates that DB2 for z/OS server issued    **
**  a SQLCODE -301 for incompatible data type conversion from        **
**  string data type (e.g. CHAR, VARCHAR, GRAPHIC, VARGRAPHIC        **
**  etc.) to numeric data type in V10 CM mode when implicit          **
**  cast is not supported or V10 NFM mode when DDF_COMPATIBILITY     **
**  zparm is set to DISABLE_IMPCAST_NJV or SP_PARMS_NJV to           **
**  disable implicit cast, and the client is CLI Driver              **
**  or v11 NFM mode & APPLCOMPAT = V10R1 when DDF_COMPATIBILITY      **
**  is set to SP_PARMS_NJV or DISABLE_IMPCAST_NJV to disable         **
**  implicit cast either from string data type to numeric or         ** 
**  from numeric data type to string data type.                      **
**                                                                   **
**  QW0366FN = 8                                                     **
**  A QW0366FN 8 record indicates that DB2 for z/OS server           **
**  returned output data match the data types of the                 **
**  corresponding CALL statement arguments when DDF_COMPATIBILITY    **
**  zparm is set to SP_PARMS_NJV.                                    **
**                                                                   **
**  QW0366FN = 9                                                     **
**  A QW0366FN 9 record indicates a data type conversion from        **
**  a TIMESTAMP WITH TIME ZONE input to a TIMESTAMP data             **
**  during input host variable bind-in process on server when        **
**  DDF_COMPATIBILITY zparm is set to IGNORE_TZ to ignore the        **
**  time zone information sent by Java IBM Data Server Driver.       **
**                                                                   **
**  QW0366FN = 10                                                    ** 
**  RTRIM, LTRIM or STRIP version 9 being used with mixed data       **
**                                                                   **
**  QW0366FN = 1101                                                  ** 
**  Indicates that the INSERT statement that inserts into an XML     **
**  column without XMLDOCUMENT function has been processed (which    **
**  should result in SQLCODE -20345 when run on DB2 release prior    **
**  to V11). Starting with V11, SQL error will no longer be issued.  **
**  Application will no longer recieve SQLCODE for this Statement.   **
**                                                                   ** 
**  QW0366FN = 1102                                                  **
**  Indicates that V10 XPath evaluation behavior was in effect which **
**  resulted in an error. For instance, a data type conversion error **
**  could have occured for a predicate that would otherwise be       **
**  evaluated to false. Starting from V11, such "irrelevant" Errors  **
**  might be suppressed so an application might no longer recieve    **
**  the SQLCODE for this Statement.                                  **
**                                                                   **
**  QW0366FN = 1103                                                  **
**  Indicates that a dynamic SQL uses the ASUTime limit that has     **
**  been set for the entire thread for RLF reactive governing.       **
**  For instance, when a dynamic SQL is processed from package A,    **
** if the ASUTime limit is already set during other dynamic SQL      ** 
** processing from package B in the same thread, the SQL from        **
** package A will use the ASUTime limit set during the SQL           **
** processing from package B. Stating with v11, dynamic SQLs from    **
** multiple packages will use the ASUTime limit that is set          **
** considering its own package information.                          **
**                                                                   **
** QW0366FN = 1104, 1105, 1106, 1107                                 **
** Indicates that CLIENT special register (CLIENT_USERID,            **
** CLIENT_WRKSTNNAME, CLIENT_APPLNAME, CLIENT_ACCTNG) has been set   **
** to a value that is longer than what is supported prior to V11.    **
** A shorter value has been used instead.                            **
**                                                                   **
** QW0366FN = 1108                                                   **
** Indicates that CLIENT special register (CLIENT_USERID,            **
** CLIENT_WRKSTNNAME, CLIENT_APPLNAME, CLIENT_ACCTNG) has been set   **
** to a value that is longer than what is supported prior to V11.    **
** Truncated values upto the supported lengths prior to v11 have     **
** been used for RLF table search instead.                           **
**                                                                   **
** QW0366FN = 1109                                                   **
** Indicates that CAST(string AS TIMESTAMP) was processed for the    **
** input string of length 8 and input was treated as a store clock   **
** value (or input string was of length 13 and was treated as a      **
** GENERATE_UNIQUE value). This behavior is incorrect for a CAST     **
** and is valid for TIMESTAMP built-in function only. This behavior  **
** is being corrected in DB2 11 so that input to CAST is not         **
** treated as a store clock value nor GENERATE_UNIQUE.               **
**                                                                   **
** QW0366FN = 1110                                                   **
** Indicates the integer argument of SPACE function is greater       **
** than 32764.                                                       **
**                                                                   **
** QW0366FN = 1111                                                   **
** Indicates the optional integer argument of VARCHAR function       **
** has a value greater than 32764. *                                 **
***********************************************************************

 

Useful stuff indeed!

Phew! Not a bad list, huh? Now you see why these IFCIDs are so useful. It could well be, that you have none of these “alive” in your system today. Or, of course, it could be that you get millions of the things! Somehow you will have to work out a way to save the data, analyse it to get to the root cause, and then, finally, fix the problem(s).

 

Saved by APPLCOMPAT?

You could argue that the new DB2 11 parameter Application Compatibility will save you, but this is really a false economy. All it enables is the guarantee that the code will still “run”. However, in two more DB2 releases the code will fail and, in two more releases – so about six years – who will even know *how* to change which piece of source code and, perhaps even, where is that source code?

 

Saved by BIFCIDs

Personally, what I would do, is :  to run our SQL WorkloadExpert tool to trap all the required [B]IFCIDs for a few hours (at first!). Then I would analyse the results, fix the code where it needs fixing – and repeat! – I would keep doing this until no IFCID records are coming out and I would be set!

BIF Usage Video (11min:)       BIF Presentation

What is even better, is that our SQL WorkloadExpert will work correctly even when any new QW0366FN values appear – so when IBM decides to add another code (Like the new values 9 and 10 above for example) this BIF Usage still works correctly.

 

Of course, you may have another tool that you use at your site.

Can it see “Where’s the BIF?”

How will you deal with loop-hole usage in production code?

 

As usual, any question or comments gladly welcome!

 

TTFN

Roy Boxwell

 

 

2014-10 QUERYNO – Query what?

 

 

Can you Track your SQL Back in time?

Do you have standards to catch QUERYNO abuse?

QUERYNO Newsletter SOFTWARE ENGINEERING & SEGUS www.seg.de/en

 

This month I’d like to confront you all with a really nasty little bit of truth…

We have had the ability to specify a number to “mark” an SQL in DB2 for years and years now, using the QUERYNO keyword.

This number is used for the QUERYNO columns of the plan tables for the rows that contain information about this SQL statement. But how many of you out there are actually using this? Even if you do use it, how many of you have got standards and practices in place to guarantee that there is no “misuse” of this feature? Yep, this month we are going to take a long, hard look in the mirror…

 

Definition of QUERYNO

The late great Richard A. Yevich summed it up neatly in 2001 as:

QUERYNO was introduced in V6 as a way to specify a particular QUERYNO for a SQL statement and match it to a HINT, which reference[s] the QUERYNO column in the plan table. It is used at BIND time by DB2, and nothing else.

Regards,
Richard

Of course these days, in fact as of DB2 V7, the QUERYNO has its own column in the SYSIBM.SYSPACKSTMT and can be used for “other things”. You can happily append QUERYNO to static (in fact also Dynamic SQL – but more about that later!) SQL like this:

 

How it Looks

EXEC SQL
    SELECT MAX(IBMREQD)
    INTO :O2TEST5-COL1:O2TEST5-NULL
    FROM SYSIBM.SYSDUMMY1
    WITH UR
    QUERYNO 777777
END-EXEC

All it does is “tie” the SQL to a user-defined arbitrary number, normally for HINT usage. Now the whole point of this was so that you, the DBA, could see that last week QUERYNO 77777 was working a treat, but this week it has suddenly gone all tablespace scanny on you. Your job, as always, is to fix the problem – so that DB2 can keep on chugging along!

 

Working without QUERYNO

Let us imagine that you do *not* use QUERYNO, so the SQL looks like:

EXEC SQL
    SELECT MAX(IBMREQD)
    INTO :O2TEST5-COL1:O2TEST5-NULL
    FROM SYSIBM.SYSDUMMY1
    WITH UR
END-EXEC

After long and hard analysis(!) you decide to add a predicate to speed it all up:

EXEC SQL
    SELECT MAX(A.IBMREQD)
    INTO :O2TEST5-COL1:O2TEST5-NULL
    FROM SYSIBM.SYSDUMMY1 A
    WHERE A.COL1 =
          (SELECT B.COL2 FROM ROY.TABLE B WHERE B.COL3 = A.COL2)
    WITH UR
END-EXEC

 

Can you track this SQL back in time?

Now, however, you trace this SQL you will find it quite hard to extremely impossible, to match this version to the “old” version in any sort of DB2 SQL Data Warehouse. Even our SQLWorkloadExpert will have trouble trying to do so because the SQL has completely changed.  Introducing new tables, changed SQL text etc. however the core competence of the SQL has not changed. All that has changed is the way we get the correct answer.

 

Now why you should all be using QUERYNO is obvious! The original query:

EXEC SQL
    SELECT MAX(IBMREQD)
    INTO :O2TEST5-COL1:O2TEST5-NULL
    FROM SYSIBM.SYSDUMMY1
    WITH UR
    QUERYNO 777777
END-EXEC

 

Brave new coding standards

Is now changed to be:

EXEC SQL
    SELECT MAX(A.IBMREQD)
    INTO :O2TEST5-COL1:O2TEST5-NULL
    FROM SYSIBM.SYSDUMMY1 A
    WHERE A.COL1 =
          (SELECT B.COL2 FROM ROY.TABLE B WHERE B.COL3 = A.COL2)
    WITH UR
    QUERYNO 777777
END-EXEC

 

And now you *can* match this SQL through time with no problem? Neat huh?

Problems problems problems…

Problems however are plentiful… first up is “cut-and-pasteitis”. This is a very nasty, and highly contagious, programmer disease, not just limited to green screen usage but also to GUI developers. It is *very* tempting to just grab a “neat” piece of SQL and drop it into the code and you are done! But if this contains an existing QUERYNO, or even no QUERYNO, then you are destroying your data warehouse.

You must enable QA code checking in order to stop any possible misuse of this Feature.

This entails checking all source code for the existence of a QUERYNO where relevant and needed (Remember that QUERYNO can only be used on DECLARE, SELECT, DELETE, INSERT, MERGE, REFRESH and UPDATE statements). The checking of the numbers used in order to make sure you do not have the number 1234546 55 different times. Also check for the allowance of “gaps”, for the really deranged among us who always want the numbers to go up sequentially in the code (I usually start at 1000 and go up in 1000 increments to start with, as I am indeed one of those poor deranged souls!).

 

Playing with time travel

Once all this is in place, and you are trapping and mapping your data,you can then do wonderful SQL time line Analysis  to really see what an SQL, or even a group of SQLs, did over time! QUERYNO Newsletter SOFTWARE ENGINEERING & SEGUS www.seg.de/en

– Did the change I do really help?

– By how much?

– Can you match the SQLs next to each other etc.

Not only that, but when you are trapping everything, why not use it to see if QUERYNO is actually being used correctly?

(We are currently building a “Use Case” for our SQL WorkloadExpert to do just that.)

 

No way for Dynamic?

I mentioned earlier that the Dynamic world is a bit different. Of course each Dynamic SQL gets its own Statement Id when it comes into the cache, (similar to the Statement Id static SQL gets in the BIND and also in the EDMPOOL cache), but you have no “fixed” point to compare to. Unless, that is, you have a nice SQL Data Warehouse where you store all your Dynamic SQL to enable time travel queries here too!

 

Please note one very important bit of information here

– None of the IFCIDs (316, 317, 318, 400 or 401) actually contains the QUERYNO!- For Static SQL you must fetch it yourself.- For Dynamic SQL its use is purely for documentation in the SQL Text itself.

 

Our question to you for Research purposes

How do you use or *not* use QUERYNO? Do you have standards to catch abuse?

For research purposes please send me an answer with Yes or No

I would dearly love to know!

 

 

Looking forward

As simple as it sounds the big problem has always been time…

Who has enough time to add QUERYNO to all their old legacy code? No-one of course!

But then SQL WorkloadExpert can help there too using our “Multi-Row Fetch” Use Case to show you the legacy code that gives the “biggest bang for the buck” and enables you to kill two birds with one stone!

As you change the heavy hitters for MRF, simply use an edit macro to add in QUERYNO to all of the embedded SQL

– Simple as that!

 

As usual any queries or criticism gladly accepted!

TTFN

Roy Boxwell

2014-09 Detecting invisible SQL since DB2 10

 

What you can actually see

is not everything that is really there…

   

 

The never-seen-before Static SQL Statements

I’ve been involved in reviewing some data provided by our new SQL WorkloadExpert tool (aka WLX). Over the past weeks, the one thing that I’ve seen time and time again, is the sheer number of Static SQLs that are running – and I mean running badly!

In the past, unless you always monitored 24×7, you never really had a chance to see all the static SQL that was running on your Plex. But now WLX makes for a real game changer!

It’s amazing what you can see – without the cost of a 24×7 Monitor.

I liken it to the phrase “Eyes Wide Shut”.

 

What you can do in WLX is a different way to find bad guy SQLs and that is by using an intensive view to find the SQLs that use a large amount of CPU per hour. I call these guys my “key-players” now some of these are obviously “old friends” but you will be surprised at how many “new contacts” you suddenly have!

Here are a couple of beautiful little examples that I found by using WLX to show me the most CPU intensive SQLs running on a Plex over a couple of days.

 

Viewing the static SQL

First query found now, and not seen before, looks like:

SELECT COL1
FROM TABLE1 T1
WHERE T1.IDENT = ?
AND T1.STATUS NOT IN ( 4, 6 )
WITH UR

Our systematic history viewer displays the following. The left hand scale is seconds and the right hand scale is logarithmic absolute numbers:

 

 

Shown here above are the CPU and the Elapsed Time, both per hour and in seconds. Then Getpages and Executions, both per hour and logarithmically graphed.

What you can see on the right side of the graph, is a dramatic drop in CPU and Elapsed and Getpages while the execution rate is constant. This was achieved by simply adding an appropriate index for that query.

 

 

Viewing the SQL from a different angle

Viewing the same data in a different way, (now on the per execution level instead of the per hour level) the results shown below look even better! (The left hand scale is seconds and the right hand scale is absolute numbers):

 

The Execution rate went *up*, but the resource usage dived down after the index was created – Great stuff indeed! This statement always flew under the radar and never raised a red flag before, but now? Slaps on the back all round!

Of course you could ask “Why was this not seen before?”

the answer is “It was never seen before!” Eyes wide shut – remember?

 

The same in tabular form

For those of you who may find the graphs hard to read, here’s the above data in a tabular form:

 

 

Never-seen-before DELETE Statement

Next up, is a nice Little never-seen-before DELETE Statement:

DELETE FROM TABLE1
WHERE COL1 = ?
AND   COL2_DATE = ?
AND   COL3 = ?
AND   COL4 = ?
AND   COL5 = ?
AND   COL6 = ?
AND   COL7 = ?

In the graph below, the left hand scale is seconds and the right hand scale is logarithmic absolute numbers:

4(9)

 

Again, looking at the per hour data, you can see a nice “dive” effect right after an index on all columns was created (Last three data points in this case). Good catch, eh?

 

Below is a new view of additional data for locks and waits. Note that the Global Lock wait count magically “disappears” due to it hitting zero. Again you can see the graph very nicely diving down at the end. Wonderful!

 

 

Again: here’s the above data in a tabular form:

 

 

 

 

 

What you can actually see is not everything that is really there…

These examples quickly show how Static SQL that was always thought to be well-tuned and running “OK” can, in fact, be hogging your machine without you even knowing it!

Remember that what you can actually see is not everything that is really there…

I wonder how many of these invisible hogs you have at your site? And have you*not* yet seen them?

 

As usual any queries or criticism gladly accepted!

TTFN

Roy Boxwell

2014-08: A Million ways to kill your DSC (Dynamic Statement Cache)

That’s No Way to Treat Your Best Friend!

 

The Dynamic Statement Sache (DSC) can, and should, be your best friend for helping SQL run nice and fast on your machine. If you are good, you have a latency (how long the statements stay in the DSC) of about two days, and everything looks hunky-dory at first glance.

However, it could be that your best friend is not all he’s cracked up to be!

The first inkling that something somewhere is not right, is when you see your DSC flush rate soaring upwards (i.e. how many statements are being flushed per hour). Statements are flushed all the time through RUNSTATS, or security changes, etc. But when you see thousands of statements grabbing their coats and heading for the door you know something is wrong – and it can’t just be the music!

 

1,000’s of INSERTS statements are taking a “slot” from the DSC for no purpose!

What I have seen is that lots of people concentrate on Dynamic SQLs that are SELECT or UPDATE or predicated DELETEs and MERGEs, but no-one bothers about simple INSERT statements… and guess what I had seen? Yep, 1,000’s are INSERTS using literals, each of which takes a “slot” from the DSC for no purpose whatsoever!!! These are really, really nasty indeed…

 

How to list and fix the “bad guys” (the INSERTS Statements)

To find them,

– I use a snap of the DSC.

– Then I select only the INSERT Statements,

– and then simply exclude any that have parameter markers.

What you are left with, is the list of “bad guys” that, if excessive in number, must be fixed as soon as possible!

 

Here’s how I do this using our SQL PerformanceExpert (SPX):

1 First I snap the DSC and show only the INSERTS:

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot1

Here you can see in our DB2 10 NF test system that there are 2,673 statements.

 

2 Identifying the “good guys”

After filtering on “INSERT”, we get the next Panel:

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot2

Here I can see only 200 INSERTS are in the cache. Note that some have multiple executions.

These are the “good guys” – Multiply executed but singly prepared.

 

3 Now I use the primary command PR to just dump these statements in a file for ISPF usage:

"News from the labs" 2014-08 - A Million ways to kill your DSC - Screenshot3 - SQL Statements in a file for ISPF usage

Here all of the SQL text is shown, (it carries on over to the right hand side, of course!)

Now I can see some INSERTS with Parameter Markers, but I want to see the ones without.

 

4  INSERTS without Parameter Markers

Simply doing these ISPF commands enables me to see that view too:

X ALL

F ALL ?

F ALL INSERT 1

 

Then I simply page down looking for INSERTS with no matching question marks.

Of course you can also do this with a small REXX, etc

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot4

Now I can quickly see that there appears to be a long set of INSERTS – all with literals.

All of these should be changed to be Parameter Markers to help boost overall system Performance!

 

Naturally I use the above SQL data for much more than just Parameter Marker usage. You can imagine the fun to be had just conducting text searches to see which SQL uses which Built-In-Function, for example. CHAR9 usage anyone?

The possibilities are endless!

Now I get to spend lots of time with my best friend.

 

Happy Tuning!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-07: APREUSE(WARN) – like getting married?


DB2 11 REBIND, something old, something new, something borrowed, something blue

Something olde,
Something new,
Something borrowed,
Something blue,
oh my goodness,
what’s going on,
within DB2?


Olde, New, Borrowed and Blue… in DB2 11 REBIND processing

Now once you stop laughing, the idea is that:

olde” is REBIND processing which changes old access paths,
new” is the new access path that comes from REBIND processing,
borrowed” is the use, and abuse, of OPT_HINT processing, and finally
blue” is, of course, IBM itself!!!

 

New in DB2 11… REBINDs with APREUSE(WARN)

Now in DB2 11 CM and above, you can issue REBINDs with the APREUSE(WARN) feature. This allows you to be nearly 100% sure that your access paths will *not* change, *but* it will regenerate the run time structures which allows fast xPROC processing. How does it do this magic? Quite simply: DB2 uses “internal” Opt Hints. They are not externalized in the PLAN_TABLE and they ignore the OPT_HINT ZPARM setting completely. In fact, they are generated from the “new”, in DB2 9 NF and above, directory element Explain Plan section in the Package, (this contains a sort of compressed version of the PLAN_TABLE access path data but nowhere near all of the access path data is saved away).

 

So what happens is, DB2 gets told to REBIND a package

– first it extracts the current access plan and stores it away,
– then it calls the optimizer for the new access path using the stored version as an OPT_HINT.
– If the hint is used, the access path is honoured and, if running with EXPLAIN(YES), you get PLAN_TABLE data with “APREUSE” in the column HINT_USED while OPTHINT stays empty.

 

Unknown access path “deviations” if the hint failed

Now, if a hint failed, a new access path is created but you then get spaces in the HINT_USED column. I bet you see what I am aiming for??? Yep, all this does is create unknown access path “deviations” while also retaining possibly useless and obsolete access paths! The whole system makes nearly no sense to me whatsoever!

 

APREUSE(ERROR) to migrate to DB2 11

Now with APREUSE(ERROR) at least you either got new code for a complete package or you didn’t get *any* runnable code. It was still not taking advantage of any new optimizer functionality, but on the other hand was 100% stable.

The way I see it today, the only way to actually use this feature is purely in version migration and with some sort of explain tool to actually check out stuff before & after the event.

 

My way to migrate to DB2 11 is

1) Make sure RUNSTATS is 100% up to date and correct

2) Make sure plan management is active and possibly in EXTENDED mode

3) Make sure all packages are bound with EXPLAIN(YES)

4) Migrate to DB2 11 CM

5) Use a tool to analyse all current access paths and see which ones go *bad* with and without using APREUSE(WARN). Also, check out any CHANGED access paths (table order, different Index etc.)

6) Run with APREUSE(WARN) only those packages where no changes are found and also use APRETAINDUP(NO)

7) Run without APREUSE(WARN) on those packages with improvements to get the full functionality that you have paid good money for

8) Work first on any statements that go bad and then the changed ones

9) From this point on analyse every REBIND package and only do the REBIND if improvements in access path are visible

10) Get ready for DB2 vnext and repeat the above!

 

APREUSE(WARN)’s drawback

The real dangers with APREUSE(WARN) are that you will end up locking your code into access paths that are completely obsolete and could be dramatically improved with the creation of a new index etc. This new data is not and cannot be used due to the hint nature of this parameter. Worse, in my opinion, is the fact that there are a ton of reasons when the hint is not used and new access paths come charging over the hill like some irate mother-in-law!!!

 

Use APREUSE(WARN) *just* for release migration and *only* for the first rebind

Last word: As always with new parameters and options there are times to use them and times to *not* use them!  In this case *just* for release migration and *only* for the first rebind. Then — just like a wedding dress — never ever use again until the next one.

 

As usual, any comments or criticisms are greatly welcome!

How do you plan to use this parameter?

 

TTFN Roy Boxwell
Senior Software Architect