Roy Boxwell, Autor bei SOFTWARE ENGINEERING

12. Februar 202014. Januar 2021

2020-03 Db2 REORG SYSCOPY: DRAIN Delays are Despicable

I heard about an interesting problem the other day. Please remember that “interesting” to me is just that: “interesting”. ”Interesting” for the DBAs and employees of the firm where it happened is, naturally, a bit different.

A normal start

Monday morning and all is well until around 07:00, when delays start appearing in online transactions.

Soon the delays are gaining the advantage and customers are starting to complain.

At around about 07:20 nearly the whole machine just sat there…

About 15 minutes later everything started running normally.

Hmmm, interesting.

The stage is set

So, the lead DBAs are off and running, looking for bad SQL that could possibly have caused the disturbance in the force.

They were checking whether the coupling facility was under stress, they were checking for parallel running REORG, MODIFY, or QUIESCE in the SYSIBM.SYSCOPY, they were using our WorkLoadExpert (WLX) to see what was happening in the time window involved.

Tuesday arrives

And so do I! As luck would have it, I am at this site to hold a presentation all about BindImpactExpert, which saves you from bad access paths, and RunstatsRescue, which rescues you from badly timed RUNSTATS. Now this site already has these products, but I must present to a new intake of DBA and developer employees.

Check everything

After my presentation we checked everything and found a few timeouts and deadlocks, but nothing serious. Then I got my Deer Stalker hat on, (now there’s an image!), and decided to see where delays were coming in. One of the developers had already done a quick WLX check and had seen very high Drain Lock values.

WLX outputs a summary of what workload has been processed which, here in the labs on my little test system, looks like this:

Wait times in microseconds because of …                                
 latch requests               :                 594                    0
 page latch                   :                   0                    0
 drain locks                  :                   0                    0
 drain lock claims            :                   0                    0
 log writer                   :               32230                    0
 synchronous I/O              :             6840389                 9623
 locks                        :                   0                    0
 synchronous execute          :                   0                    0
 global locks                 :                   0                    0
 other threads read  activity :            28429563                    0
 other threads write activity :               13166                    0

At the actual customer site I could see a 1000 times increase in wait drain locks between two runs!

Utility versus SQL

Now, as I am sure you are all aware, a drain is *only* used by a command or a utility, so I started thinking:

“There must be a parallel running something somewhere!”

“There must be a parallel running something somewhere!”

So I used WLX to show me the SQLs that had the highest wait drain locks. I took the top two (over 30,000 seconds of delay!) and got their tablespace names from the Db2 Catalog using the first referenced table column.

Horrible Job to do

It is not a pleasant task to search master address space sysouts, but in this case it was the only way. Using the tablespace names from the Db2 Catalog. I just navigated to the date and time in question and did F commands in SDFS on the tablespace names.

BINGO!

After a few minutes I found a strange message (Correlation Id=010.TLPLKNC3) about a drain not being possible for an internal Db2 system task

– This happens to be used by REORG, and it gave me the info about where the drain came from. I looked at that system’s log output in the time range, and sure enough there was a REORG of that very table which kept failing due to not getting the drain!

A retry too far?

At this site they use a 3^rd Party software tool to generate REORG, RUNSTATS and COPY and it had a default of RETRY 30. It kept trying 30 times before eventually failing.

This explains the missing SYSCOPY entry as the REORG had failed!

The other one?

So that was one bad boy found – What about the other? That tablespace did not appear in any of the sysouts. So I drilled down to get the full SQL text (Over 8000 bytes long!) and scrolled on down to the FROM lines – and there was the *first* table name! After the dust had settled, I went back and I saw that :

this one table was actually in every single SQL in the top 200 delay candidates! A pretty central table if you ask me!

Who? and Why?

The management level now wanted to know who did it? And why? I left that part up to the customer, of course, as I do not want to get involved in a finger pointing exercise! My feeling is: like most disasters, it was probably a chain of events something like:

REORG generated on Sunday.
Due to some unforeseen problem the JCL was shunted to Monday.
On Monday at 07:00 it started and killed the machine.

Never again?

Best way is to generate jobs straight to the Job Scheduler for instantaneous execution (No waiting or shunting allowed) and guess what? We have the RealTimeDBAExpert (RTDX) that allows you to do just that! You can easily exclude objects from utilities based on days of the week, hours of day etc. If you have a bought-in or home-grown system would it also have caused this disaster?

Console Messages

If the customer had had our WLX Console Message Use Case licensed, it would have also made the detective work much easier, as then you have a central place to go where *all* console messages from *all* members are written and searchable! This would have saved a lot of time and trouble.

Bottom Line

(Removing my Deer Stalker hat and replacing it with a mortar board.)

Look everywhere, trust no-one and remember that a DRAIN is almost definitely nothing to do with SQL or a badly timed RUNSTATS.

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell

10. Februar 202014. Januar 2021

2020-02 Db2 UPDATE column: UPDATEs for nothing and CPU ain’t free!

A bad misquote of a great Dire Straits song, but it is one great thing I saw last year!

What is an Update?

We all know what an update is, right? You have a column containing some value and you wish to update it to a new – different – value. You code an UPDATE with SET and all is done.

But, what happens “under the covers” when the column value you are updating is exactly the same as the new value?

Suspected real-time abuse

The problem surfaced gradually… as all good problems do… The DBAs were wondering why an SQL that was executed half a million times per day was waiting for other threads for so long. (This was discovered using our WorkLoadExpert (WLX)). The wait times were frighteningly high, and so it was decided that this SQL should be the target of some sort of tuning effort.

It then came out, while gathering basic tablespace statistical data, that :

the related tablespace had not been REORGed for over six months when, according to SEGs WorkLoadExpert, there were half a million updates every day against it!

RTS or SEG Bug?

Naturally the first idea is it must be a bug.

Either the Real-time Statistics (Not incrementing the REORGUPDATES counter – It could even be NULL for example) or,
heaven forbid, a bug in our WorkLoadExpert.

I took a closer look at the SQL:

UPDATE aaa.bbb
 SET COL1 = ?
 WHERE COL1 = ?
 ;

and just sort of wondered out loud,

“They are not using the same value in both parameter markers are they?”

The “they” in this case was the developer of course…

Oh My Word!

After a quick e-mail discussion, it then came out that, that was indeed the case! Db2 is clever but sometimes not that brilliant! The developer had had the idea of executing this SQL to “see” if the value existed or not… He did not think about what Db2 then actually does…

Under the Covers

Db2 does not “know” what the current value of COL1 is. It used, in this case, Index access to get and obtain an X lock on the target page – (this was then the reason for the very large wait times on the other threads!). Once the lock was held, it could then discover that there was *nothing* to do, and so it did nothing! Then it happily released the lock(s) after doing nothing and returned SQLCODE 0.

No Log data was written as nothing was done, and REORGUPDATES was not incremented as nothing was done, but the CPU/Elapsed overhead was enormous!

The right way

The head DBA has said the SQL should look like:

SELECT ‘A’ FROM aaa.bbb
WHERE COL1 = ?
FETCH FIRST 1 ROW ONLY
WITH UR
;

This is now on its way through change management! Naturally, it is the way the developer should have coded it from the get go!

What can you do?

Now this, of course, caused alarm bells to ring as “cut-and-paste” is your friend. If there is bad code in one place it is probably being copied further, even as you read this! Using SEG’s WorkLoadExpert and the Real-time Statistics, you can easily pull out and analyze any “bad guys”.

Put simply, use the UPDATE count from WLX and correlate it to the REORGUPDATES counter. If they are wildly different, taking into account REORGLASTTIME and WLX_TIMESTAMP, then you have a candidate to further track down!

Now, where are those refrigerators we have to move?

As always, I would be pleased to hear from you and any war stories you have!

TTFN,
Roy Boxwell
Senior Architect

More about WLX

27. Januar 202014. Januar 2021

2020-01 How RUNSTATS causes an error SQLCODE

Impossible! I hear you all say… How can a RUNSTATS cause an SQL error code? Well, my gentle readers, read on.

Two examples in SPUFI at the end of this newsletter

What we know

We are all, I hope, aware that a badly timed RUNSTATS can cripple your SQL Performance. Just think of a see-sawing, or volatile table, and it gets a RUNSTATS at the zero point… Tablespace scan is then a good access choice… After a couple of million inserts that is probably not the best!

Awful, Implicit Casting

Implicit casting came along a few Db2 releases ago and either made you very happy: “I never have to worry about using apostrophes again!” or very angry: “Developers must know what they are coding!” The thing is, we got it and you cannot *not* do it!

It goes both ways…

The idea behind Implicit casting, in case you don’t know, is that :

The predicate COL1 = 1 and COL1 = ‘1’ are the same to Db2.

It will take that ‘1’ and “cast” it to a variable type that will then be able to be compared to whatever type COL1 is.

So,

if COL1 contains a numeric representation of data everything is ok.
But if COL1 is CHAR(1) and contains a ’Y’ you then get an SQLCODE -420…

Nasty, nasty business…

Access path is also sub-optimal

When Db2 does implicit casting, it casts to DECFLOAT to then do the comparison etc. (See my DECFLOAT newsletter about what I think about that data type!) Anyway, it is *not* good for performance. In fact, it got so annoying that a few users actually asked for a ZPARM to switch off implicit casting! They actually wanted an error whenever they compared mismatched data types. This request was, of course, turned down.

So where’s the beef?

So, what happened in production was quite simple really:

A query had been running for three years with never a problem. Then one day, after a RUNSTATS, it started returning SQLCODE -420. This was due to the fact that the table processing order had switched, due to the RUNSTATS running at a “bad” time.

This in turn exposed the buggy SQL WHERE predicate that previously had never seen the bad data as it was removed in an earlier branch! This could also happen when the column in question actually contains non-numeric data perhaps dues to a code bug .

See the example SQL at the end

RUNSTATS Rescue for Db2 z/OS

This user site has our software RUNSTATS Rescue so they quickly got the query up and running, without doing a code change, in a matter of seconds.

Of course, this bad code was discovered in the middle of the year end production freeze so they could not simply change the application code! This code change has been scheduled and will be done in the new year.

Bottom Line

RUNSTATS can cause negative SQLCODEs to be returned and RUNSTATS Rescue buys you the needed time to continue running – even with buggy SQL code!

As always, I would be pleased to hear from you and any war stories you have!

TTFN,
Roy Boxwell
Senior Architect

PS: Just for the record, here are two examples in SPUFI:

 SELECT *
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = '1'                                                   
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100            
 ---------+---------+---------+---------+---------+---------+---------+-
 SELECT *                                                               
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = 1                                                      
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNT408I SQLCODE = -420, ERROR:  THE VALUE OF A STRING ARGUMENT WAS NOT
          ACCEPTABLE TO THE DECFLOAT FUNCTION     
 DSNT418I SQLSTATE   = 22018 SQLSTATE RETURN CODE                        
 DSNT415I SQLERRP    = DSNXRNUM SQL PROCEDURE DETECTING ERROR            
 DSNT416I SQLERRD    = -245 0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION    
 DSNT416I SQLERRD    = X'FFFFFF0B'  X'00000000'  X'00000000'  X'FFFFFFFF'
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION

And the result when the column actually contains a numeric valid value and not “Y”:

CREATE TABLE BOXWELL.SYSDUMMY1 LIKE SYSIBM.SYSDUMMY1       
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 INSERT INTO  BOXWELL.SYSDUMMY1 VALUES ('1')                
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE615I NUMBER OF ROWS AFFECTED IS 1                      
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = '1'                                        
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1                     
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = 1                                          
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

Cool and dangerous! Just like me! LoL ! ! !

3. Dezember 201914. Januar 2021

2019-12 Fun with Db2 12 PBR RPN

I have recently enhanced our SpaceAssuranceExpert (SAX) product to automatically help out if partitions, or partitioned indexes, are getting too big for their boots in a productive system.

SAX – What is it?

The idea behind SAX, is to stop any and all chances of getting a dreaded SQLCODE -904 in production, especially “out of the blue”.

Our idea was, in Db2 12, with the new PBR RPN to do an on-the-fly ALTER to the DSSIZE, just like SAX does now with the SECQTY to avoid running out of extents.

RTFM Time

A quick look in the manual tells you that :

it is an immediate ALTER (as long as you make the new DSSIZE larger than the old one!) and there are no package invalidations or REORGS required.

This is fantastic!

So I created a nice little PBR and then ran a horrible Cartesian join SPUFI to flood the first partition with data. This join had a TIME card of (,1) to limit it to one second of CPU before getting an Abend S322.

SAX Monitor

The SAX monitor reacted perfectly and did the TP alter to 513 GB DSSIZE (I had an increment size of 512 as a test), and got an SQLCODE -666 (I just love that SQLCODE…), as the INSERT was still running.

This ALTER was then internally queued to be attempted later, like in any other failure case.

All well and good.

Wham! Nasty errors !

Then I did the same for a DPSI on my PBR RPN… Oh dear!I got a nasty IO Error and then an even nastier ROLLBACK loop, meaning I had to cancel the IRLM to stop Db2… (There is an APAR for this problem PH18977.)

Rollback loop?

The ROLLBACK loop was caused by me choosing to use NOT LOGGED as a tablespace attribute. We have a finite amount of log space, and when the transaction was S322’d after one second of CPU,

the rollback could not find one of the archive logs and then we had to cold start Db2

– Not pretty!

Proper test!

I then wrote a couple of little test programs that actually COMMITted after 5000 inserts and then the ALTERs all worked as designed.

Do I worry too much about extended format and extended accessibility?

Next, I worried about the ominous “extended format and extended addressability“ attributes in the DATACLASS for a PBR RPN, and wondered what would happen if a customer has SAX running and it happily ALTERs a TP to say 6 GB when they can only address 4 GB…

IBM to the rescue!

Luckily for us, Db2 development had thought about this!

If you attempt to create a PBR RPN (even with a very small size) and your DATACLASS does not have the two attributes set, you get an error message 00D70008 telling you this detail.

If you ALTER an existing tablespace to be a PBR RPN and your DATACLASS does not have the two attributes set, then the ALTER works fine. But remember, this is a pending alter and you *must* do a REORG at the TS level with inline TP level copies. This REORG then fails – also with 00D70008.

So, in other words, SAX cannot hurt you here!

What about PBGs?

After all this we also considered PBGs. They have a limit as well – MAXPARTITIONS in their case. So we added an ability to also ALTER MAXPARTITIONS as well. Here you must be more careful though, as these ALTERs are still immediate *but* they invalidate any referring packages!

ABIND YES or NO?

If you work with ABIND set to YES all is good as the ALTER comes in, Db2 invalidates your packages and the auto rebind happens so fast that you do not even notice it happening, however, if you work with ABIND NO then any packages, even the package actually running and doing the inserts, will fail! If you are just using dynamic SQL then it is 100% OK otherwise – Buyer beware!

Aha!

I have opened an Analytics Idea (DB24ZOS-I-1057) to try and get this loophole closed, as I cannot see what access path change could be affected by going from 32 to, say, 36 MAXPARTITIONS.

Db2 keeps adding the parts dynamically and nothing happens then… Go figure… Anyway, if you would like it – Please vote for it!

There are already some nice comments attached to it:

DP commented
this is just a limit in the catalog. So how could access path be affected? The actual growth of a partition doesn’t invalidate the package so how just changing the limit do so ?
BW commented
I opened a Case on this asking why packages are being invalidated in this situation and it is still open waiting for a reply.
BD commented
With only impact to catalog, not sure how Access Path would be impacted. Seems wasteful and counter productive to invalidate packages.

Remember – You never stop learning!

As always I would be pleased to hear from you!

TTFN,
Roy Boxwell
Senior Architect

14. November 201914. Januar 2021

2019-11 Db2 Existence checks: SELECT for DUMMIES

Every now and again, the DBAs of the world meet with the developers of the world and try and bang some heads together…I already wrote a newsletter all about existence checks which has changed and evolved over the years and now I wish to investigate another age old problem: “Is Db2 there?”

Newsletter 2012-10: Existence check SQL – Through the ages

Reader test results: see the end of this newsletter for SELECT queries tested by one of our readers on Db2 10 and DB2 11.

Why?

First up, is “Why?” – Why would you want to know that?

Well think about our wonderful agile world these days. The JAVA developers do not even know what Db2 *is* and so when they write/script/generate/get generated for them SQL (Delete whatever is not appropriate), they have no idea *where* the code will run. To be fair, they should not really need to…the idea these days is that your back end can be anywhere and on any hardware platform.

So, wouldn’t it be cool to be able to tell the application that the required database is “not currently available” – This is when the “Is Db2 there?” SQL question, and all of its problems, was born.

Problems

How to do this ?
How often to do this ?
Do you need to do this ?
Haven’t you got better things to do than this?

1 – How to do this ?

The first problem was solved when a developer wrote this SQL:

SELECT * FROM SYSIBM.SYSDUMMY1

PREPAREd, OPENed, FETCHed it and CLOSEd it. Now remember, with dynamic SQL, everything must be in cursors so you really did have to define a cursor, open the cursor, fetch the cursor and, if you are being a good coder, close the cursor. If all of this was successful then you “knew” that Db2 on z/OS was up and running and available for all your further SQL requests.

This was ok when there was not so much dynamic SQL going on (Before Db2 V8), but naturally it started getting a bit out of hand very quickly.

The SQL then changed a little bit to be:

SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1

The logic here was it could only return one row/value back and so made the network traffic less. Naturally Db2 went off and counted the single row in that table using a tablespace scan *every* time…

This slowly became a small, but very noticeable, overhead with 1,000,000’s of getpages against the smallest Db2 Catalog table ever invented…

Now Terry Purcell and the optimizer crew came up with the PRUNED access plan in Db2 9:

PRUNED

Db2 does not generate an access path for the query because the query is guaranteed to qualify zero rows, such as the case of an always-false WHERE clause.

For example:

WHERE 0 = 1

This happens when a “leg” of an SQL query cannot ever be true. In this case, the complete leg of the SQL query is thrown away. With this advance in optimizer logic the predicate:

WHERE 1 = 0

Actually became really useful, as it meant that the *entire* SQL was only run through the optimizer and the SQLCODE was set to +100 if doing a FETCH with the * style SQL or the value of your host variable was set to zero if using the COUNT(*) style. This meant that the getpages for SYSIBM.SYSDUMMY1 dropped down to zero – which is a very very good thing

IBM have been improving what PRUNED can do ever since, and Terry Purcell wrote :

“Another benefit of column pruning is that it can provide extra table pruning opportunities.
In Db2 10, an enhancement was delivered to prune outer join tables if no columns were required from that table and it was guaranteed that the table would not introduce duplicates. In Db2 12, this pruning enhancement is extended to outer joins to views/table expressions that are guaranteed not to return duplicates and no columns are required from the view/table expression.”

2 – How often to do this?

How long is a piece of string?

The problem is that when you do the “Is Db2 there?” test at the start of a long conversational transaction, it could well be that Db2 has disappeared at the end…

3 – Do you need to do this?

Again, an age-old question – Most JAVA frameworks do this (Like they do embedded COMMIT and ROLLBACK!) and so the simple answer is yes!

4 – Haven’t you got better things to do than this?

Of course you do! We all do, but these little changes can actually have a major impact on the overall machine throughput!

Measured Response

Here are some real word results of tests with dynamic SQL with/without the 1 = 0 and running a million times each. I have tried doing a variety of flavors, like Heinz…

First up are the SELECT * style selects:

SEL-867 SELECT * FROM SYSIBM.SYSDUMMY1 
SEL-868 SELECT * FROM SYSIBM.SYSDUMMY1
     WHERE 1 = 0     
SEL-869 SELECT * FROM SYSIBM.SYSDUMMY1
     WHERE 1 = 0
     WITH UR 
SEL-870 SELECT * FROM SYSIBM.SYSDUMMY1 
     WHERE 1 = 0 
     WITH UR 
FOR FETCH ONLY 
SEL-871 SELECT * FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                 
    WITH CS                      
SEL-872 SELECT * FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                 
    WITH CS                      
FOR FETCH ONLY               
SEL-873 SELECT * FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                  
    WITH RR                      
SEL-874 SELECT * FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                  
    WITH RR                      
FOR FETCH ONLY

Now the SELECT COUNT(*) style selects:

SEL-875 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1
SEL-876 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1
     WHERE 1 = 0     
SEL-877 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1
     WHERE 1 = 0
     WITH UR 
SEL-878 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1 
     WHERE 1 = 0 
     WITH UR 
FOR FETCH ONLY 
SEL-879 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                 
    WITH CS                      
SEL-880 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                 
    WITH CS                      
FOR FETCH ONLY               
SEL-881 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                  
    WITH RR                      
SEL-882 SELECT COUNT(*) FROM SYSIBM.SYSDUMMY1 
    WHERE 1 = 0                  
    WITH RR                      
FOR FETCH ONLY

The results were:

                            Rows   Rows  Index  Tblsp   
    StmtID  Execs  Getpg    exam   proc  Scans  Scans   
----------  -----  -----   -----  -----  -----  -----   
       867  1000K     2M   1000K  1000K      0  1000K   
       868  1000K      0       0      0      0      0   
       869  1000K      0       0      0      0      0   
       870  1000K      0       0      0      0      0   
       871  1000K      0       0      0      0      0   
       872  1000K      0       0      0      0      0   
       873  1000K      0       0      0      0      0   
       874  1000K      0       0      0      0      0 

       875  1000K     2M   1000K  1000K      0  1000K   
       876  1000K      0       0  1000K      0      0   
       877  1000K      0       0  1000K      0      0   
       878  1000K      0       0  1000K      0      0   
       879  1000K      0       0  1000K      0      0   
       880  1000K      0       0  1000K      0      0   
       881  1000K      0       0  1000K      0      0   
       882  1000K      0       0  1000K      0      0

   StmtID   Tot. CPU  Avg. CPU   Tot. Elap  Avg. Elap  
----------  SS.mmmmmm  SS.mmmmmm  SS.mmmmmm  SS.mmmmmm  
       867   2.481068   0.000002  16.869174   0.000017  
       868   1.291817   0.000001   8.119036   0.000008  
       869   1.331707   0.000001   6.220308   0.000006  
       870   1.330709   0.000001   9.460538   0.000009  
       871   1.306633   0.000001   8.984930   0.000009  
       872   1.326517   0.000001   9.181043   0.000009  
       873   1.324213   0.000001   7.392330   0.000007  
       874   1.322115   0.000001   7.694403   0.000008
 
       875   3.066573   0.000003  18.824193   0.000019  
       876   1.467454   0.000001   6.168161   0.000006  
       877   1.478887   0.000001   7.714925   0.000008  
       878   1.480241   0.000001   9.903416   0.000010  
       879   1.477947   0.000001   9.965071   0.000010  
       880   1.469971   0.000001   9.195251   0.000009  
       881   1.467287   0.000001   8.687336   0.000009  
       882   1.487021   0.000001  11.742945   0.000012

The SELECT * with WHERE 1 = 0 WITH UR is actually the winner!

Remember – You never stop learning!

As always, I would be pleased to hear from you!

TTFN,
Roy Boxwell
Senior Architect

Results

Reader test results from Isaac Yassin:

Measured for dynamic SQL (SPUFI in UR mode – DSNESPUR), using Omegamon application trace. Same machine, same LPAR, same subsystem, measuring the 3rd consecutive execution of each SQL. Both do a „prepare/open/fetch/close“ according to trace.

select * from sysibm.sysdummy1 vs. select * from
sysibm.sysdummy1 where 0=1

The differences are:

For Db2 10

Without „where 0=1“

 PREPARE – InDB2 Time = 0.00048 InDB2 CPU = 0.00047
 FETCH - 2 fetches, InDB2 time = 0.00131 InDB2 CPU= 0.00116 (avg. 0.00058)
 LOCKs - Type = MDEL , Level = S , DB=DSNDB06 , PS=SYSEBCDC , Count = 1
 Sequential scan of data page = 2 pages scanned, 1 row qualified

With „where 0=1“

PREPARE - InDB2 Time = 0.00077 InDB2 CPU = 0.00049
FETCH - 1 fetch, InDB2 time = 0.00008 , InDB2 CPU = 0.00008
Locks - none
NO sequential scan

For Db2 11

Without „where 0=1“

PREPARE - InDB2 Time = 0.00045 InDB2 CPU = 0.00045
FETCH - 2 fetches, InDB2 time = 0.00121 InDB2 CPU= 0.00120 (avg. 0.00060)
LOCKs - Type = MDEL , Level = S , DB=DSNDB06 , PS=SYSEBCDC , Count = 1 
Sequential scan of data page = 2 pages scanned, 1 row qualified

With „where 0=1“

PREPARE - InDB2 Time = 0.00044 InDB2 CPU = 0.00044
FETCH - 1 fetch, InDB2 time = 0.00007 , InDB2 CPU = 0.00007
Locks - none
NO sequential scan

select 1 from sysibm.sysdummy1 vs. select 1 from sysibm.sysdummy1 where 0=1

The differences are:

For Db2 10

Without „where 0=1“

PREPARE - InDB2 Time = 0.00057 InDB2 CPU = 0.00054
FETCH - 2 fetches, InDB2 time = 0.00015 InDB2 CPU= 0.00015 (avg. 0.00007)
LOCKs - None
No Sequential scan of data page

With „where 0=1“

PREPARE - InDB2 Time = 0.00060 InDB2 CPU = 0.00034
FETCH - 1 fetch, InDB2 time = 0.00006 , InDB2 CPU = 0.00006
Locks - none
NO sequential scan

For Db2 11

Without „where 0=1“

PREPARE - InDB2 Time = 0.00047 InDB2 CPU = 0.00047
FETCH - 2 fetches, InDB2 time = 0.00087 InDB2 CPU= 0.00082 (avg. 0.00041)
LOCKs - none
NO Sequential scan of data pages

With „where 0=1“

PREPARE - InDB2 Time = 0.00044 InDB2 CPU = 0.00044
FETCH - 1 fetch, InDB2 time = 0.00007 , InDB2 CPU = 0.00007
Locks - none
NO sequential scan of data pages

Bottom line

„select 1“ works better than „select *“ as you don’t really scan the data and not taking locks, using „where 0=1“ eliminates 1 fetch and the locks & scan as well.

Isaac Yassin

28. Oktober 201914. Januar 2021

2019-10 DECFLOAT examples: Fun with Numbers

This time I have a nice little real world story for you. Pull up your good old copy of the Db2 9 SQL Guide and I will begin…

What is a number?

Numbers used to be pretty straightforward and easy to spot.

If it was a numeric between -32768 and +32767 it was a SMALLINT or Small Integer, or even a half word for the greying people reading this.

If it was a numeric between -2147483648 and +2147483647 it was an INTEGER or full word. If it was a numeric between 1 – 10³¹ and 10³¹ – 1 it was a DECIMAL or packed, or comp-3

Along came BIGINT

After a few years we even got BIGINT (Between -9223372036854775808 and +9223372036854775807), which is much better than DECIMAL if you ask me and is also a double word. In all of these cases, you knew with 100% certainty that you were dealing with a number – Not a column name or anything else!

Then the brown stuff hit the fan… DECFLOAT.

DECFLOAT Arrives

In Db2 9 they introduced the horror that would become DECFLOAT to the Db2 z/OS world. It arrived with little fanfare and a massive amount of ifs and buts, but someone somewhere wanted it…

The problem?

The horrible problem that DECFLOAT dragged in to the party, was the fact that it could contain characters like :

SNaN or
NaN or
INF or
Infinity.

Yep, these are all valid DECFLOAT number definitions.

The law of numeric comparisons

The law of comparisons now looks like this:

-NaN < -SNaN < -INF < -0 < 0 < INF < SNaN < NaN

Clear on that?

NaN stands for „Not a Number“ with the „quiet“ attribute.
SNaN stands for “Signaling Not a Number” which “signals” when it is used (Imagine a Facebook wave here!) and
INF is simply infinity.

What problems does this cause?

Well, let’s imagine you have 100’s of tables in a huge Db2 data warehouse, all designed in 1983 with a column called NAN that stands for “National Advertising Notice” – NAN is a good size reduction, still clear about its contents and everything is fine right up until Db2 9 comes along!

Upper and Lower trouble

Now you might think, it is not that bad, IBM are using SNaN and NaN right? Column and Table names have always been upper and lower case sensitive so NAN is not the same as NaN right? Wrong!

The use of inf INFINITY SNAN NAN nAn snAN are all “recognized” and accepted as valid DECFLOAT numbers.

What can you do?

Delimit the column name with double apostrophes is the only correct way. “NAN” for example. Now you go and change 4000 COBOL programs etc.

If you know what you are doing…

Naturally if you are coding ok, you can happily “code” around the problem. Here is some example SQL to show you where you can get badly caught, and to introduce those of you who don’t know about DECFLOAT the joys you can have:

 CREATE TABLE BOXWELL.TESTCASE (NAN  CHAR(10) NOT NULL         
                               ,IBAN CHAR(32) NOT NULL)        
 ;                                                             
 INSERT INTO BOXWELL.TESTCASE VALUES ('TESTNAN' , 'TESTIBAN')
 ;
 COMMIT
 ;

Let’s do some SELECTs

First up, a normal select against the test table:

 SELECT  NAN                                                 
 FROM BOXWELL.TESTCASE                                       
 ;                                                           
 ---------+---------+---------+---------+---------+---------+
 NAN                                                         
 ---------+---------+---------+---------+---------+---------+
 TESTNAN                                                     
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Correct! Hoorah!

 ---------+---------+---------+---------+---------+---------+
 SELECT "NAN"                                                
 FROM BOXWELL.TESTCASE                                       
 ;                                                           
 ---------+---------+---------+---------+---------+---------+
 NAN                                                         
 ---------+---------+---------+---------+---------+---------+
 TESTNAN                                                     
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Also correct! Delimiters work as planned!

Now the really evil one:

 SELECT COUNT(*)                                                         
 FROM BOXWELL.TESTCASE                                                   
 WHERE IBAN IN (SELECT NAN                                               
                FROM SYSIBM.SYSDUMMY1)                                   
 ;                                                                       
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNT404I SQLCODE = 12, WARNING:  THE UNQUALIFIED COLUMN NAME NAN WAS     
          INTERPRETED AS A CORRELATED REFERENCE                           
 DSNT418I SQLSTATE   = 01545 SQLSTATE RETURN CODE                         
 DSNT415I SQLERRP    = DSNXORSO SQL PROCEDURE DETECTING ERROR             
 DSNT416I SQLERRD    = 0 0  1  1143510784  0  0 SQL DIAGNOSTIC INFORMATION
 DSNT416I SQLERRD    = X'00000000'  X'00000000'  X'00000001'  X'44289700' 
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION             
                                                                          
 ---------+---------+---------+---------+---------+---------+---------+-
           0                                                              
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

You see what has happened here?

The SQL has “incorrectly” used NAN but gets only a +12 Warning from the parser. It then happily runs on and gives a result. Is this what should happen? I don’t think so!

It gets worse

Now try these queries:

 SELECT NAN                                                  
 FROM SYSIBM.SYSDUMMY1                                               ;                                                           
 ---------+---------+---------+---------+---------+---------+
                                                             
 ---------+---------+---------+---------+---------+---------+
 +NAN                                                        
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Imagine having typos in your SELECT lines…

 SELECT sNAN                                          
 FROM BOXWELL.TESTCASE                                
 ;                                                    
 ---------+---------+---------+---------+---------+---
                                                      
 ---------+---------+---------+---------+---------+---
 +SNAN                                                
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Cool huh?

What can you do?

Monitor SQLCODE +12 – These should *never* really happen in production! How? Well you can use our WorkLoadExpert to discover these problems in two distinct ways.

Capturing the SQL workload “sees” the SQLCODEs when processed
If you have the SQLCODE Use Case then you get all SQLCODEs that are output by Db2

Doing it right

The recommendation is actually to CAST any of these literal values like CAST(‘snan’ as DECFLOAT) which removes *any* chance of getting it “confused” with a column name or anything else!

 SELECT CAST('NAN' AS DECFLOAT)            
 FROM SYSIBM.SYSDUMMY1                     
 ;                                         
 ---------+---------+---------+---------+--
                                           
 ---------+---------+---------+---------+--
 +NAN                                      
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Where else?

Now that these beasts are also allowed in indexes you must really check all of their usage.

Have fun!

As always I would be pleased to hear from you!

TTFN,
Roy Boxwell
Senior Architect

12. September 201914. Januar 2021

2019-09 A DBA has got to know his limitations

Sorry, but I could not resist the Dirty Harry quote there…and yes I know it really should be “his/her” but that would not then be a quote, so please forgive me!

I have spoken/blogged about limitations in Db2 a few times over the years and we have a nice little tool called Space AssuranceExpert (SAX) that stops a whole bunch of nasty events (read -904’s) at sites around the world.

It grows and grows

SAX started out “just” monitoring the size and number of Extents/Linear Datasets that Db2 was allocating. It then either did a dynamic ALTER of the SECQTY to allow the dataset to get to its physical maximum size before running out of extents, or it warned people that the number of physical datasets was running out.

First enhancements

– SMS Storage groups Monitoring

We pretty quickly enhanced it to then also monitor SMS Storage groups as when your copypool runs out of space it can be a tad embarrassing!

At the same time, we changed the logic to work with the -1 -1 style allocations when IBM brought out their sliding scale logic.

– Universal support

When PBG Tablespaces came out, we again enhanced the product due to the special processing limits of these Universal Tablespaces.

Think MAXPARTITIONS here!

– Db2 12

Along came Db2 12 with its new RPN format and the ability to have a different DSSIZE for each partition, and each index got its own DSSIZE.

– Sequences and Identities

Then we were asked if we could also monitor Sequences and Identities that are defined as non-cyclic. A lot of these beasts are defined so that they just “hit the buffers” at a certain number. So we added that to the system as well.

– Numeric Primary Key support

Next came a request to monitor Numeric Primary Key columns. These were actually sequences before sequences were even invented! You have, typically, a SMALLINT field which keeps being incremented – Fine for the first 32,767 but after that?

No time to rest on our Laurels!

Are we now finished? No! There is *always* more to do!

What if you are nearing the column limit, or worse yet, the Index column limit?
What if the size of your index is getting dangerously close to the absolute physical maximum?
Talking about absolute physical maximums – What if you are running out of DBIDs?
Or, even worse, running out of OBIDs within a database – Do you even check these?
What happens when you try to ALTER it? Or create that new index?
Or even just a view with an INSTEAD of function?

BOOM! Is what happens!

SAX saves the day again!

We have enhanced SAX yet again, so that

you can now run a batch job to review all of these limits.

This gives you plenty of time to take corrective action before the bridges are burning…

What are the limits?

In the Db2 SQL Reference Guide in the Appendix “Limits in Db2 for z/OS” it lists them out, but here are the important ones for you all:

750 Columns in a table.

Actually it can be less, depending on the complexity of any VIEWs, but to keep it simple this value is used as the upper limit.

64 Columns in an index. This includes INCLUDE columns.

For a partitioning index (That is an “old style” index that is actually used to partition the data) you get a maximum size for PADDED indexes of 255 – n bytes and, if NOT PADDED, then 255 – n – 2m – 2d bytes.

For any other indexes you get a maximum size for PADDED indexes of 2000 – n bytes and, if NOT PADDED, then 2000 – n – 2m – 2d bytes.

Where:

n is the number of columns which are NULLable
m is the number of varying length columns
d is the number of DECFLOAT columns

An FTB (News from the Labs 2019-07) is only possible for indexes that are 64 bytes or less. If you ALTER add a one byte column to a 64 byte index then you have broken this limit and that index is no longer eligible for FTB processing.

65,217 Databases may be defined. This includes any widow databases of course (Typically DSNnnnnn style empty databases where the implicitly created tablespace has been dropped when the table was dropped, but the database was “forgotten”).

32,767 OBIDs within a single database. Remember, that the number of objects (OBIDs) within a database is not a simple 1:1 relationship. Each tablespace, index or referential relationship takes two, whereas each table, check constraint, aux for LOB, XML for XML, trigger or view with INSTEAD OF takes one.

Stay up to date

If Db2 development bring any new ones out, think Agile here, then our SAX will be enhanced quickly to check for it!

What is the plan?

Simply schedule the SAX limit checker to run once a week or so.

It will report all its findings and, if required,
issue WTOs to alert you to any dangers before hitting any of these limits.

And you?

What do you do at your shop now? Do you monitor these hard limits? Does anyone care?

Go ahead, make my day and install a SAX trial!

For a SAX trial please email to : techsupport@seg.de

Consult our Space AssuranceExpert (SAX) page

As always I would be pleased to hear from you!

TTFN,
Roy Boxwell
Senior Architect

20. August 201914. Januar 2021

2019-08 FTB (Fast Traversal Block): Just another TLA?

So we get a new TLA (Three Letter Abbreviation) to try and wrap our heads around in the busy busy world of mainframe databases. This time it comes with a twist, as FTB is actually not “Feed The Beast” sadly, but “Fast Traversal Block”.

In reality, it is referred to everywhere as:

FIT (Fast Index Traversal)

All clear on this and still with me? Good then I may begin!

The problem with modern indexes

The core problem these days is time… the time it takes to go through the leaf pages down to the actual data page is just too long. The vast amount of data we have forces upon the index a large number of layers which, in turn, forces a lot of non-leaf page accesses.

Index structure

We all know how indexes look in Db2, but remember they all have a root page (which is always technically speaking a non-leaf page) which points, normally, to non-leaf pages which, in turn, point to either more non-leaf pages – so you are getting extra levels in the index here – and finally they point to a leaf page which actually contains pairs of keys and RIDs so that Db2 can get the data from the table page.

Which level are you?

One of the quirks of Db2, is that the leaf pages are called Level 0 and then the non-leaf pages directly above them are Level 1. The root page above these is then Level 2 and so you have a “two level” index.

So how many fetches for a random data access?

The first thing Db2 does is read the root page, looking at the root page it sees which non-leaf page must be fetched and it repeats this until it lands on the leaf page and then it fetches the data. So in my two level index it takes four fetches to get to the data. Now imagine you have a seven level index? Yep you end up fetching a ton of non-leafs!

The solution? FTB!

In Db2 12, IBM introduced the Fast Traversal Block to keep a copy of the non-leaf data in a new area, separate from the buffer pools.

The major reasons for the expected performance boost, are the facts that the structure is L2 cache-aware and each page is equal in size to one cache line (256 bytes), and I/O is massively reduced for the “other” non-leaf pages.

One size fits all?

The size of the FTB is determined by the new ZPARM INDEX_MEMORY_CONTROL with valid values AUTO, DISABLE or 10 – 200,000 MBs. By default AUTO is on. With this value, the size of the FTB is either 20% the size of all available buffer pools or 10MB, whichever is the highest.

The devil is in the detail

To actually start working with FTBs, you have to wait until the FTB Daemon has found an eligible index. From all of your indexes there will be quite a few candidate indexes. These must follow the following rules:

It cannot be longer than 64 bytes
It cannot be versioned (So the OLDEST_VERSION and CURRENT_VERSION must be the same!)
It cannot contain a TIMESTAMP column with TIMEZONE
It cannot have more than 2,000,000 leaf pages

Once through that selection list, the Daemon starts seeing what is happening to this index, is it a random select? That’s is a good thing! Is it an index only access? That’s even better! A split! Oh dear, that’s very bad…from all this, every two minutes, the daemon decides whether or not to use FTB for this index.

Control is in your hands

This is all well and good, but there are a lot of people who like to see *exactly* what is going on and where! So IBM introduced a new command:

-DIS STATS(IMU)

From this you get to see which indexes are being processed and how much space they are using. The Daemon also spits out console messages so you can see what happened and why. Further there is a new Catalog table SYSIBM.SYSINDEXCONTROL where you can micro-manage exactly which indexes to attempt to force into FTB and which ones to exclude. I would try and avoid this option and just let Db2 do the business!

Gotcha’s?

Always! FTB is limited to 10,000 per member which is probably ok! When a MASS DELETE (or TRUNCATE TABLE) happens then the FTB is removed for the related indexes and, even more importantly, is to keep up with current Db2 maintenance! As always with new things, it takes a while to get the birthing pains out of the way!

INCLUDE can kill you!

Finally, remember that the length cannot be greater than 64 bytes. So if you decide to add that one byte character column to the index to get Index Only access, you might then push the index over the edge and up to 65 bytes…

Before ALTERing indexes always check if they are in the FTB (or were!) and check if your alteration will disallow the FTB!

A little SQL to show you what you have

WITH INPUT (NLEVELS, LENGTH, INDEX_NAME) AS                            
 (SELECT  COALESCE(E.NLEVELS , A.NLEVELS )  -- FIRST RTS THEN INDEXES   
       , SUM(CASE D.COLTYPE                                             
             WHEN 'DECIMAL ' THEN                                       
 -- IF , SEPERATOR           SMALLINT( CEILING(( D.LENGTH + 1,0 ) / 2 ))
                             SMALLINT( CEILING(( D.LENGTH + 1.0 ) / 2 ))
             WHEN 'GRAPHIC'  THEN D.LENGTH * 2                          
             WHEN 'VARG'     THEN D.LENGTH * 2                          
             WHEN 'LONGVARG' THEN D.LENGTH * 2                          
             ELSE D.LENGTH                                              
             END)                                                       
       + SUM(CASE A.PADDED                                              
             WHEN 'Y' THEN 0                                            
             ELSE                                                       
               CASE D.COLTYPE                                           
               WHEN 'VARG'     THEN 2                                   
               WHEN 'LONGVARG' THEN 2                                   
               WHEN 'VARCHAR'  THEN 2                                   
               WHEN 'LONGVAR'  THEN 2                                   
               WHEN 'VARBIN'   THEN 2                                   
               WHEN 'DECFLOAT' THEN 2                                   
               ELSE 0                                                   
               END                                                      
             END)                                                       
       + SUM(CASE D.NULLS                                               
             WHEN 'Y' THEN 1                                            
             ELSE 0    
                END) AS LENGTH  
       , STRIP(A.NAME) CONCAT '.' CONCAT STRIP(A.CREATOR) AS INDEX 
 FROM SYSIBM.SYSINDEXES        A                                       
     ,SYSIBM.SYSKEYS            C                                       
     ,SYSIBM.SYSCOLUMNS         D                                       
     ,SYSIBM.SYSINDEXSPACESTATS E                                       
 WHERE A.UNIQUERULE     <> 'D'              -- NOT DUPLICATE            
   AND D.COLTYPE        <> 'TIMESTZ'        -- NOT TIMEZONE             
   AND A.OLDEST_VERSION = A.CURRENT_VERSION -- NOT VERSIONED            
   AND A.TBNAME         = D.TBNAME                                      
   AND A.TBCREATOR      = D.TBCREATOR                                   
   AND A.NAME           = C.IXNAME                                      
   AND A.CREATOR        = C.IXCREATOR                                   
   AND A.NAME           = E.NAME                                        
   AND A.CREATOR        = E.CREATOR                                     
   AND C.COLNAME        = D.NAME                                        
 GROUP BY A.NAME, A.CREATOR , E.NLEVELS, A.NLEVELS)                     
 SELECT NLEVELS, LENGTH , INDEX_NAME                                    
 FROM INPUT                                                             
 WHERE LENGTH <= 64   
 ORDER BY NLEVELS DESC
 FOR FETCH ONLY       
 WITH UR              
 ;

I have limited it to just show you just the eligible unique, non timezone, non-versioned indexes with a length of 64 or less of course!

Blog time

Here’s an excellent write up from John Campbell with even more technical details:

https://www.ibm.com/developerworks/community/blogs/897a7c98-57af-4523-9cfa-07ebc3f996b4/entry/Db2_12_greatest_hits_with_John_Campbell_fast_index_traversal?lang=en

and from Akiko Hoshikawa: a very good IDUG Blog:

https://www.idug.org/p/bl/et/blogaid=646

As always I would be pleased to hear from you!

TTFN,
Roy Boxwell
Senior Architect

18. Juli 201914. Januar 2021

2019-06 A little quote from Terry Purcell: It just takes one

This is a little quote from Terry Purcell which I think should be understood much better than it possibly is today!

Really? Just one?

The quote is based on the idea that :

One single SQL statement can bring your entire system to its knees.

Many people think this is a huge exaggeration – However it is not!

It happened to me

This is not a friend of a friend story, but a story that really happened. Names have, of course, been changed but the story itself is 100% accurate.

The story

It was a normal Db2 PTF Maintenance night…

At this shop they always put maintenance in at midnight on Saturdays. All done automatically, and then roll the update around the data-sharing group so that, relatively quickly, all sub-systems have the same PTF level.

So far so normal

Sunday is not that busy at the site and no-one noticed any change.

Monday was different

Monday morning the machine started acting up. Customers could not login and the help desk was 100% busy. The lead DBA, George, had by around 08:00 a.m. received the first escalation to management and so the number of calls went up.

The bad guy

Using our WorkloadExpert (WLX) software together with our Bind ImpactExpert (BIX) the other DBAs, Fred and Ringo, quickly identified one single SQL statement that was taking 30,000 times more resources than in the prior week!

This was an SQL with table functions and LEFT, INNER JOINs etc.

What was it?

The comparison in BIX showed that all that had changed was one index access was now matchcols one instead of matchcols two! That was it! This one tiny change on one little SQL killed this machine…

The fix?

They decided to roll back the PTFs and quickly did this and everything returned to normal…

The lesson?

Always test any PTF before you go live – as just one SQL can kill ya!

Terry had indeed warned us all!

How can we help?

Well, thank you for asking: Our product CDDC contains an SQL replay and compare function that would have spotted this SQL in two different complementary ways. Firstly the BIX part of CDDC would have spotted it straightaway and secondly the replay itself would have thrown this out as a major outlier and bad guy candidate!

Different ways to Rome

They could possibly even have fixed this without backing out the PTF apply. They could have tried using our RUNSTATS Rescue to attempt to use older statistics and see if one of them would have given them a matchcols two access path. This time, however, with all the managers breathing down their necks it was decided – Undo all the changes!

In your shop

How would you have handled this situation in your shop?

Would/Could you have seen this before it happened?

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

16. Mai 201914. Januar 2021

Machine Learning for Db2 z/OS: Artificial Intelligence – Hype or Reality?

Now that the Db2 Optimizer has gone all Artificial Intelligent (AI from now on) on us, I thought it would be interesting to review where we are in this brave new world!

Keeps you out of a BIND

My firm created our BindImpactExpert (BIX) software many moons ago to compare access paths, typically before a program non SQL change or a REBIND after RUNSTATS etc. Just to warn you not to do the REBIND or whatever you were going to do, as it would change the access plan to something you may well not really want to have. This means not just no-brainers like index access to tablespace scan, but also different index usage or table access order changes.

Apples and Oranges

So, what has that got to do with AI?
Well, we used a marketing slogan of trying to compare oranges and apples to try and make it clear :

how tough it really is to try and match SQL statements across different program releases and to compare the results.

ML and AI – The new buzzwords

Now in Db2 12, the optimizer has got machine learning (ML from now on) to help it study which host variables are used the most and how many rows are usually returned by a given query.

Do you “know” your data?

When the optimizer “knows” this info it can of course make much better “guesses” at good repeatable access plans that help everyone! Now the usage of AI and ML always gets me interested because of all the fun you can have with them.

There was recently a great article in my favourite science magazine “New Scientist” all about how to fool image recognition AI:

https://www.newscientist.com/article/mg24232270-200-machine-mind-hack-the-new-threat-that-could-scupper-the-ai-revolution/
(subscribe to read the whole article)

Now you should see the connection back to our BindImpactExpert slogan. So much for Oranges and Apples – AI cannot even see that an Orange is an Orange…

this is not good or even perhaps “suboptimal”…

Adversarial Images and the fun you can have with them…

From 2018 this article

https://www.wired.co.uk/article/artificial-intelligence-hacking-machine-learning-adversarial

Within this article is a great paper all about graffiti changed road signs! Here is a direct link to the article https://arxiv.org/pdf/1707.08945.pdf

Fooling Deep Learning „

Another great paper where Deep Learning AIs jump off the deep end:

http://www.evolvingai.org/fooling

Disappearing Humans

and finally this joyous article and a link to a You Tube video:

https://www.theverge.com/2019/4/23/18512472/fool-ai-surveillance-adversarial-example-yolov2-person-detection

But do not wear this picture as a T-Shirt down the street anytime soon!!!

Would you sit in a driverless car?

What this all boils down to is that I, for sure, will never sit in a driverless car within the next 40 years!
Why?

Firstly, I believe a true AI does not exist yet and will not exist for at least another 40 years.
Secondly, when I see an Orange I see an Orange because I “know” what an Orange looks like! As you have seen AIs have no concept of what an Orange *is* and so can easily see a drill instead or mistake skiers for dogs…
Thirdly, I am a programmer and someone somewhere is writing and testing – I hope! – the code… need I say more? Don’t even get me started with Agile or KANBAN.

extract from © 2019 – „We need to talk, AI“ – Dr. Julia Schneider und Lena Kadriye Ziyal
A Comic Essay on Artificial Intelligence

https://weneedtotalk.ai/

Room full of CTO’s

Have you heard about the hopefully apocryphal story of a room full of CTO’s at a software conference when they were asked “Would you fly in a computer controlled aircraft?” All their hands went up. Then they were asked “Would you fly if it was your software teams that had written the flight control software?” all their hands went down!

Flying is strangely different

While talking about flying my answer would be different?

Why?
Because in the air you have no-one else apart from other highly automated devices anywhere near you! This is the one hope that I can see for the whole driverless car stuff – Driving on the Interstate, Motorway or Autobahn. In the air, the environment is much more rigid. No lights, no crossings, no opposite traffic, no foot traffic, no children chasing a ball across the road etc.

They will come…

I am sure that driverless cars will come soon and I am equally sure that people will die because of them – However, I am also sure that lots of accidents will be avoided as well. After all the AI will always react faster than a human.

With this comes the ethics of AI and ML of course which will also play a major role. If the car has to decide in a crash scenario :

who gets hurt, should it protect itself and its cargo above all others?
What about a group of playing children or a bus full of nuns?
How can it decide who lives and who dies?

Looking on the bright side at least it will not get drunk, or high, and drive! I would also seriously hope that AI does not get road rage!

But even with these “bonus” points, I still have massive doubts and serious worries about the quality of the software and the true image recognition potential of current AIs.

AI everywhere

We were recently asked by a customer to add AI and ML to our SpaceAssuranceExpert (SAX) software. This has been happily running for years, capturing dataset allocations and extent usage in Db2 and making sure they never ran out of space or extents without getting told about it well beforehand.

I had to do a real double take and think “How on earth could AI help here?”

You are approaching 32 Linear Datasets with your segmented tablespace. You have to ALTER and REORG before you use up all the space in the 32^nd. Where can AI help here?
We already have extrapolation from current usage…

Don’t get me wrong

I am not against AI and ML…
but I think we *all* need to get a bit more skeptical about what can actually be delivered now.

On the one hand host variable contents and row counts – Yes indeed, but even here outliers will cause grief and then on the other hand the complete and utter science fiction of self-driving cars in our towns and cities.

The money being spent is also amazing, in 2019 it is estimated, by Syncsort,

that the AI budget will be $35.8 Billion 44% more than in 2018.

What do you think?

I am sure that autonomous cars will be a plague upon us – are you?

TTFN,
Roy Boxwell
Senior Architect

PS: I am looking forward to the first Adversarial Images for Db2 z/OS !!!

I heard about an interesting problem the other day. Please remember that “interesting” to me is just that: “interesting”. ”Interesting” for the DBAs and employees of the firm where it happened is, naturally, a bit different.

A normal start

The stage is set

Tuesday arrives

Check everything

WLX outputs a summary of what workload has been processed which, here in the labs on my little test system, looks like this:

Utility versus SQL

Horrible Job to do

BINGO!

A retry too far?

This explains the missing SYSCOPY entry as the REORG had failed!

The other one?

Who? and Why?

Never again?

Console Messages

Bottom Line

Look everywhere, trust no-one and remember that a DRAIN is almost definitely nothing to do with SQL or a badly timed RUNSTATS.

A bad misquote of a great Dire Straits song, but it is one great thing I saw last year!

What is an Update?

Suspected real-time abuse

the related tablespace had not been REORGed for over six months when, according to SEGs WorkLoadExpert, there were half a million updates every day against it!

RTS or SEG Bug?

I took a closer look at the SQL:

“They are not using the same value in both parameter markers are they?”

Oh My Word!

Under the Covers

The right way

The head DBA has said the SQL should look like:

What can you do?

Impossible! I hear you all say… How can a RUNSTATS *cause* an SQL error code? Well, my gentle readers, read on.

Two examples in SPUFI at the end of this newsletter

What we know

Awful, Implicit Casting

It goes both ways…

Access path is also sub-optimal

So where’s the beef?

RUNSTATS Rescue for Db2 z/OS

Bottom Line

PS: Just for the record, here are two examples in SPUFI:

And the result when the column actually contains a numeric valid value and not “Y”:

Cool and dangerous! Just like me! LoL ! ! !

I have recently enhanced our SpaceAssuranceExpert (SAX) product to automatically help out if partitions, or partitioned indexes, are getting too big for their boots in a productive system.

SAX – What is it?

RTFM Time

SAX Monitor

This ALTER was then internally queued to be attempted later, like in any other failure case.

Wham! Nasty errors !

Rollback loop?

Proper test!

Do I worry too much about extended format and extended accessibility?

IBM to the rescue!

If you attempt to create a PBR RPN (even with a very small size) and your DATACLASS does not have the two attributes set, you get an error message 00D70008 telling you this detail.

If you ALTER an existing tablespace to be a PBR RPN and your DATACLASS does not have the two attributes set, then the ALTER works fine. But remember, this is a pending alter and you *must* do a REORG at the TS level with inline TP level copies. This REORG then fails – also with 00D70008.

What about PBGs?

ABIND YES or NO?

Aha!

DP commented

BW commented

BD commented

Every now and again, the DBAs of the world meet with the developers of the world and try and bang some heads together…I already wrote a newsletter all about existence checks which has changed and evolved over the years and now I wish to investigate another age old problem: “Is Db2 there?”

Newsletter 2012-10: Existence check SQL – Through the ages

Reader test results: see the end of this newsletter for SELECT queries tested by one of our readers on Db2 10 and DB2 11.

Why?

Problems

How to do this ?

How often to do this ?

Do you need to do this ?

Haven’t you got better things to do than this?

1 – How to do this ?

The SQL then changed a little bit to be:

Now Terry Purcell and the optimizer crew came up with the PRUNED access plan in Db2 9:

PRUNED

IBM have been improving what PRUNED can do ever since, and Terry Purcell wrote :

2 – How often to do this?

3 – Do you need to do this?

4 – Haven’t you got better things to do than this?

Measured Response

First up are the SELECT * style selects:

Now the SELECT COUNT(*) style selects:

The results were:

Impossible! I hear you all say… How can a RUNSTATS cause an SQL error code? Well, my gentle readers, read on.

If you ALTER an existing tablespace to be a PBR RPN and your DATACLASS does not have the two attributes set, then the ALTER works fine. But remember, this is a pending alter and you must do a REORG at the TS level with inline TP level copies. This REORG then fails – also with 00D70008.