2023-04 AI Performance

Hi! Continuing on with my AI blog (last one. I promise!) I wish to delve into the innards of the USS part of the SQL Data Insights experience and show you what it all costs!

A Quick Review Perhaps?

Please check my older newsletters for everything about install etc. of SQL DI, and one important thing which is the latest Vector Prefetch APARs (also see my last newsletter for details). Now. I will be doing „before and after“ performance reviews with this feature on and off.

Bad News First!

What I have found, is that when I take a 500,000 row table into SQL DI and choose 17 columns, it takes the *entire* machine as well as all local page datasets and I was forced to cancel it after five hours…

Looking in the Logs…

If you go trawling around your Unix Directories, you will trip over these paths:

/u/work/sqldi

Home is Where the Spark is!

This is “home” where all of the SQL DI stuff is “installed”, naturally your name might be different!

Under here is the next layer of interest to me for the Spark processing.

It is Magic!

/u/work/sqldi/spark – Now this is where Spark does “the magic” and actually computes all your vector table data. It runs in stages and the first is the Base10 (I guess numeric analysis) part. For my test data it looks like this:

Scroll down to the bottom:

So, this ran really quickly!

Internals…

Then it does a ton of internal stuff and it starts actually doing the learning, which is “progressed” in a file like this:

/u/work/sqldi/spark/worker/driver-20230315063441-0002/ibm_data2vec-15-03-2023_07:15:01.log

Just Sitting There, Typing REF and Pressing ENTER…

Of course your name will be different, but just sitting there in OMVS and using the REF command you will see this file grow in size every now and again. When it does, quickly Browse on in and you will see stuff like this:

ibm-data2Vec (1.1.0 for zOS) starting execution using file /var/sqldi/temp/training/DSNAIDB_AIDB_DAI
ibm-data2vec found the required library: libzaio.so. Proceeding with the training..                 
ibm-data2vec will use following mode: CBLAS                                                         
User has not provided training chunk size. Using 1 GB chunk size for reading training file.         
ibm-data2Vec is preallocating space for the model using user-provided value 1230314                 
ibm-data2Vec starting execution using file /var/sqldi/temp/training/DSNAIDB_AIDB_DAIN0610_IQATW001_1
83951683 ! 2023-03-15 07:17:27 ! Time elapsed learning vocab from train file = 145.91525s           
Processed 13103200 words in the training file. There are 1213852 unique words in the vocabulary: Pri
Model training code will generate vectors for row-identifier (pk_id) or user-specified primary keys 
83951683 ! 2023-03-15 07:17:27 ! Stage 1 completed. Time elapsed during file reading = 145.91643s   
Training the database embedding (db2Vec) model using 12 CPU thread(s)  

Whole Machine Gone – Oh Oh!

Now, in my case, it just sat there for a while taking all paging, all frames, all ziip and cp cpu and then it wrote out:

Epoch 0 learning rate Alpha=0.024704 Training Progress=5.00%                                        
Epoch 0 learning rate Alpha=0.024404 Training Progress=10.00%                                       
Epoch 0 learning rate Alpha=0.024099 Training Progress=15.00%                                       
Epoch 0 learning rate Alpha=0.023791 Training Progress=20.00%                                       
Epoch 0 learning rate Alpha=0.023486 Training Progress=25.00%                                       
Epoch 0 learning rate Alpha=0.023182 Training Progress=30.00%                                       
Epoch 0 learning rate Alpha=0.022885 Training Progress=35.00%                                       
Epoch 0 learning rate Alpha=0.022582 Training Progress=40.00%                                       
Epoch 0 learning rate Alpha=0.022286 Training Progress=45.00%                                       
Epoch 0 learning rate Alpha=0.021980 Training Progress=50.00%                                       
Epoch 0 learning rate Alpha=0.021673 Training Progress=55.00%                                       

That last line was written out at 12:42 and after starting at 07:17 you can see that I still had nearly a five hour wait ahead of me. Time to cancel and rethink this!

Restart!

Thankfully, on the GUI interface (where you cannot see this progress info, sadly!) the “Stop training” button worked after a while. If it does not respond then you can just issue the

S SQLDAPPS,OPTION='SQLDSTOP' 

command to stop it. Then, once all stopped, and the cpus have cooled down a bit, you can select a smaller data set and retry learning!

Smaller is Sometimes Better!

And with 40.000 rows it is much faster:

50397300 ! 2023-03-15 12:17:16 ! Stage 1 completed. Time elapsed during file reading = 26.992490s 
Training the database embedding (db2Vec) model using 12 CPU thread(s)                             
Epoch 0 learning rate Alpha=0.024765 Training Progress=5.00%                                      
Epoch 0 learning rate Alpha=0.024539 Training Progress=10.00%                                     
Epoch 0 learning rate Alpha=0.024308 Training Progress=15.00%                                     
Epoch 0 learning rate Alpha=0.024073 Training Progress=20.00%                                     
Epoch 0 learning rate Alpha=0.023826 Training Progress=25.00%                                     
Epoch 0 learning rate Alpha=0.023591 Training Progress=30.00%                                     
Epoch 0 learning rate Alpha=0.023354 Training Progress=35.00%                                     
Epoch 0 learning rate Alpha=0.023115 Training Progress=40.00%                                     
Epoch 0 learning rate Alpha=0.022878 Training Progress=45.00%                                     
Epoch 0 learning rate Alpha=0.022637 Training Progress=50.00%                                     
Epoch 0 learning rate Alpha=0.022406 Training Progress=55.00%                                     

Naturally, this is heavily dependent on the machine you have, the memory you have and the size of your local paging dataset.

EXPLAIN Yourself!

So now to do some EXPLAIN runs and then a quick comparison of the “double” AI Whammy that I have, quickly followed by the “New” PTF that, hopefully, sorts it all out.

Double Trouble?

You might have noticed that in my test SQLs I have to use the BiF AI twice. Once for the SELECT and once for the WHERE. This is because the use of the AI_VALUE column is not supported in the WHERE predicate.

Naturally, you can re-write the query to look like this:

SELECT * FROM                                      
(SELECT AI_SEMANTIC_CLUSTER( PROGRAM,              
                           'DSNTIAUL',             
                           'DSN§EP2L',             
                           'DSN§EP4L') AS AI_VALUE 
      ,A.WLX_TIMESTAMP                             
      ,A.STMT_ID                                   
      ,A.STMT_TIMESTAMP                            
      ,SUBSTR(A.PRIM_AUTHOR , 1 , 8) AS PRIM_AUTHOR
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM        
      ,SUBSTR(A.REF_TABLE , 1 , 18) AS REF_TABLE   
      ,A.EXECUTIONS                                
      ,A.GETP_OPERATIONS                           
      ,A.ELAPSE_TIME                               
      ,A.CPU_TIME                                  
      ,A.STMT_TEXT                                 
FROM DAIN0610.IQATW001 A                           
WHERE 1 = 1                                        
  AND A.PROGRAM NOT IN    ('DSNTIAUL',             
                           'DSN§EP2L',             
                           'DSN§EP4L')             
  AND A.STMT_ORIGIN = 'D'                          
  )                                                
WHERE AI_VALUE IS NOT NULL                         
ORDER BY 1 DESC -- SHOW BEST FIRST                 
--ORDER BY 1 -- SHOW WORST FIRST                   
FETCH FIRST 10 ROWS ONLY ;                         

Does My Work File Look Big to You?

The problem is that now you have a HUGE work file… In my tests it was always much quicker to code the AI BiF twice. After all, it is always „Your Mileage May Vary“, „The Cheque is in the post“ or „It depends“, isn’t it?

AI Does Use the Optimizer!

EXPLAIN Output… The AI Does indeed get output by EXPLAIN (I was surprised about this to be honest!) for the following query:

SELECT AI_SEMANTIC_CLUSTER( PROGRAM,               
                           'DSNTIAUL',             
                           'DSN§EP2L',             
                           'DSN§EP4L') AS AI_VALUE 
      ,A.WLX_TIMESTAMP                             
      ,A.STMT_ID                                   
      ,A.STMT_TIMESTAMP                            
      ,SUBSTR(A.PRIM_AUTHOR , 1 , 8) AS PRIM_AUTHOR
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM        
      ,SUBSTR(A.REF_TABLE , 1 , 18) AS REF_TABLE   
      ,A.EXECUTIONS                                
      ,A.GETP_OPERATIONS                           
      ,A.ELAPSE_TIME                               
      ,A.CPU_TIME                                  
      ,A.STMT_TEXT                                 
FROM DAIN0610.IQATW001 A                           
WHERE 1 = 1                                        
  AND A.PROGRAM NOT IN    ('DSNTIAUL',             
                           'DSN§EP2L',             
                           'DSN§EP4L')             
  AND AI_SEMANTIC_CLUSTER( PROGRAM,                
                           'DSNTIAUL',             
                           'DSN§EP2L',             
                           'DSN§EP4L')             
      IS NOT NULL                                  
  AND A.STMT_ORIGIN = 'D'                          
ORDER BY 1 DESC -- SHOW BEST FIRST                 
--ORDER BY 1 -- SHOW WORST FIRST                   
FETCH FIRST 10 ROWS ONLY ;                         

The EXPLAIN output looks like:

Then it gets an interesting STAGE2 RANGE predicate!

which resolves into:

So here we see what the BiF is doing from the perspective of the Optimizer! If you run the nested table version of the query then this line does *not* appear at all!

Notice here that the RANGE is now a STAGE1!

Optimize This!

So IBM Db2 has incorporated it into the Optimizer which is a good thing. But please remember: your SQL can have local predicates that cut down the size of the work file and so evens out the access times… Basically, you must code both and test to see which of the solutions is better for typical usage (As always really…)

Time, Measure, Repeat

Ok, now just doing one execute of the double query requires 2.58 seconds of CPU and 15.35 seconds elapsed. The statement is *in* the DSC so prepare time can be ignored. Here you can see it has been executed twice so we have average values but I am using the CPU from the batch job as it is more precise.

Changing the query to now fetch back all rows instead of first ten requires 7.06 seconds of CPU and 48.78 seconds elapsed. But it returned over 200K rows!

While the query was running you can see the SQLD SQL DI in SDSF taking quite large chunks of zIIP time…

Now I will enable Vector Prefetch with a value of 10GB to see if it makes an impact for these queries. To do this you must update the ZPARM MXAIDTCACH and then enable the changed ZPARM.

That is Not What I was Expecting!

First query is now 2.56 CPU and 15.26 Elapsed. More like background noise than an improvement. And now with the FETCH FIRST removed 7.07 and 49.36 seconds. I guess my queries are not improved with Vector Prefetch!

Could be Me…

From the IBM Vector Prefetch docu:

With vector prefetch enabled, CPU performance for AI queries with AI function invocation on qualified rows improves particularly when the ratio of the cardinality of qualified rows to the total number of numeric vectors for the column is high.

https://community.ibm.com/community/user/datamanagement/blogs/neena-cherian/2023/03/07/accelerating-db2-ai-queries-with-the-new-vector-pr

Time to Join the Real World!

Now let’s try and see if I can discover something new in real data! Anything sensitive has been obfuscated!

SELECT AI_SIMILARITY( PROGRAM,                     
                     'IQADBACP') AS AI_VALUE       
      ,SUBSTR(A.PRIM_AUTHOR , 1 , 8) AS PRIM_AUTHOR
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM        
      ,SUBSTR(A.REF_TABLE , 1 , 18) AS REF_TABLE   
      ,A.WLX_TIMESTAMP                             
      ,A.STMT_ID                                   
      ,A.STMT_TIMESTAMP            
      ,A.EXECUTIONS                                
      ,A.GETP_OPERATIONS                           
      ,A.ELAPSE_TIME                               
      ,A.CPU_TIME                                  
      ,A.STMT_TEXT                                 
FROM DAIN0610.IQATW001 A                           
WHERE 1 = 1                                        
  AND NOT A.PROGRAM     = 'IQADBACP'               
  AND     AI_SIMILARITY  ( PROGRAM,                
                          'IQADBACP')              
       IS NOT NULL                                 
  AND     A.STMT_ORIGIN = 'D'                      
ORDER BY 1 DESC -- SHOW BEST FIRST                 
--ORDER BY 1 -- SHOW WORST FIRST                   
FETCH FIRST 10 ROWS ONLY;                          

This is similar to my test from last month but now on real data. Note that I have added a predicate A.STMT_ORIGIN = ‚D‘ as I only want Dynamic SQL programs:

Dynamic Hits?

Here you can see that it has found a variety of programs that also do dynamic SQL but I also „helped“ it by only asking for dynamic SQL. So now once again but this time without the predicate A.STMT_ORIGIN = ‚D‘:

Success!

It has found nearly all from the first list but also different ones, crucially it has *not* found any Static SQL!

So, that’s enough of AI for the next few months for me. However, if you have any questions or ideas that I could try out feel free to email!

TTFN,

Roy Boxwell

2023-03 AI in the real world

OK, I kept you all waiting long enough… Here are my AI results with Db2 13 FL501!

Start at the Start

We begin with the beginning as last time:

Let’s get Connected!

Here you can see that I have already defined my little test Db2 13 system to the system:

Join the Dots …

Now just click on the vertical dots:

Here you can Disconnect, Edit (Which shows you the same window as “add connection”), List AI objects or Delete.

What do we have?

Choosing List AI objects you see what has been created:

Clicking on the down arrow on the left-hand side to expand looks a lot better than last month:

Clickedy-click-click

Now, clicking on the vertical dots on the right hand side, you can choose to Disable AI query or Enable AI query. (I have actually added a new column for consideration, so first I clicked on Disable and then clicked again on Enable)

Just the Facts, Ma’am – Again

Here you must make your “Usual Suspects” decision: which columns to actually use in building the AI Model. I am using our WorkLoadExpert performance table in this newsletter and have selected 17 columns that I think will work together nicely. Only one can be a “Key” column – I choose STMT_ID in this case. Once you are done selecting columns, click on the big blue “Next” button where you may then add additional filters to remove any rows you know are to be ignored:

Playing Chicken?

When done, click on the big blue “Enable” button and you get your last chance to chicken out:

SIO and CPU Records!

Click here and then get a cup of coffee….or go to SDSF and marvel at how much CPU and IO Spark actually uses and does this as the light bulbs dim in your part of the world…

You Keep me Spinning

Oddly, at least when I do this, the Initializing spinning wheels:

Right Round and Around

… never stop. The WLM Stored procedure for utilities was finally kicked off about 40 minutes later:

                    J E S 2  J O B  L O G  --  S Y S T E M  
                                                             
10.43.19 STC09611 ---- WEDNESDAY, 01 MAR 2023 ----           
10.43.19 STC09611  $HASP373 DD10WLMU STARTED                 
10.43.19 STC09611  IEF403I DD10WLMU - STARTED - TIME=10.43.19
10.43.19 STC09611  ICH70001I SQLDIID  LAST ACCESS AT 09:37:37

A Loaded Question?

And loaded all the required data:

ICE134I 0 NUMBER OF BYTES SORTED: 99083595                  
ICE253I 0 RECORDS SORTED - PROCESSED: 49173, EXPECTED: 49173

A quick exit and re-logon to the web interface…and Tra la!

Not only AI but Dr Who!

It is also strange that it seems to be in a time machine, one hour in advance of my local…Anyways, my new data is there and so onward! (I have since heard that our time zone setting is actually to blame and that just going back one level, and then forward again, stops the spinning wheel problem. However, just wait until Spark finishes and the stored procedure has loaded your data!)

Never Trust a Statistic You haven’t Faked Yourself!

Clicking on Data statistics shows:

Influencer of the Day?

Then you can look at the Column influence:

Super Model?

Back at the top you can then review the Model details:

Or just a Cluster….

Here are the Cluster center details:

Going back to the List AI Objects window, there are two blue buttons: Add object and Run query. I did not discuss Run Query last month but it gives you a SPUFI-like ability on the PC, tailored to the AI BiFs:

Lets RUN Away!

Clicking on Query type gives a drop-down list of the basic AI BiFs where it then gives you an example SQL (based on the documentation, *not* on any AI Tables you might have done!). Once you type in any query the “run” box turns blue:

It Works!

Click run and see the results:

Data Review

Once the model is trained, you can then review on the host what it has done. In SPUFI you can find details of what you have done in the pseudo Db2 catalog tables that support Data Insights, (I have removed a ton of rows to make this readable – sort of!):

SELECT * FROM 
SYSAIDB.SYSAIOBJECTS ; 
---------+---------+---------+---------+---------+---------+---------+---------+-------+-
              OBJECT_ID  OBJECT_NAME                       OBJECT_TYPE  SCHEMA  NAME     
---------+---------+---------+---------+---------+---------+---------+---------+-------+-
                     26  --------------------------------  T            IQA061QB IQATW001

-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
STATUS                   CONFIGURATION_ID                 MODEL_ID  CREATED_BY                        CREATED_DATE              
-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
Enabled                                36                       36  SQLDIID                           2023-02-24-07.57.42.086932

-------+---------+---------+---------+---------+---------+---------+---------+
LAST_UPDATED_BY                   LAST_UPDATED_DATE           DESCRIPTION     
-------+---------+---------+---------+---------+---------+---------+---------+
SQLDIID                           2023-03-01-10.43.38.407460  ----------------



SELECT * FROM  
SYSAIDB.SYSAICONFIGURATIONS ; 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----
       CONFIGURATION_ID  NAME                                            OBJECT_ID  RETRAIN_INTERVAL  KEEP_ROWIDENTIFIER_KEY 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-----
                     36  --------------------------------                       26  ----------------  Y                      
---+---------+-------
NEGLECT_VALUES       
---+---------+-------
                     
---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
CREATED_BY                        CREATED_DATE                LAST_UPDATED_BY                   LAST_UPDATED_DATE         
---+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
SQLDIID                           2023-03-01-09.51.00.994421  SQLDIID                           2023-03-01-09.51.00.994461



SELECT * FROM 
SYSAIDB.SYSAICOLUMNCONFIG
ORDER BY 1 , 3 , 2 ;   

--+---------+---------+---------+---------+---------+---------+---------+---------+---------+
CONFIGURATION_ID  COLUMN_NAME              COLUMN_AISQL_TYPE  COLUMN_PRIORITY  NEGLECT_VALUES
--+---------+---------+---------+---------+---------+---------+---------+---------+---------+
              36  END_USERID               C                  H                              
              36  PRIM_AUTHOR              C                  H                              
              36  PROGRAM                  C                  H                              
              36  REF_TABLE                C                  H                              
              36  REF_TAB_QUAL             C                  H                              
              36  STMT_ORIGIN              C                  H                              
              36  STMT_TEXT                C                  H                              
              36  TRANSACTION              C                  H                              
              36  WORKSTATION              C                  H                              
              36  COPIES_NO                I                  H 
.
.
.                             
              36  WLX_TYPE                 I                  H                              
              36  WORKSTATION_OLD          I                  H                              
              36  STMT_ID                  K                  H                              
              36  CPU_TIME                 N                  H                              
              36  ELAPSE_TIME              N                  H                              
              36  EXECUTIONS               N                  H                              
              36  GETP_OPERATIONS          N                  H                              
              36  ROWS_EXAMINED            N                  H                              
              36  ROWS_PROCESSED           N                  H                              
              36  STMT_LENGTH              N                  H                              

When the column COLUMN_AISQL_TYPE has a value of “I” it means it is ignored by AI processing. Also note that this table SYSAICOLUMNCONFIG gets two extra columns (COLUMN_VECTOR_CARDINALITY and MAX_DATA_VALUE_LEN) once you apply the vector prefetch upgrade APARs:

  1. For IBM Z AI Optimization (zAIO) library and IBM Z AI Embedded (zADE) library in the IBM Z Deep Neural Network (zDNN) stack on z/OS:
    • Apply OA63950 and OA63952 for z/OS 2.5 (HZAI250).
    • Apply OA63949 and OA63951 for z/OS 2.4 (HBB77C0).
  2. For OpenBLAS on z/OS:
    • Apply PH49807 and PH50872 for both z/OS 2.5 and z/OS 2.4 (HTV77C0).
    • Apply PH50881 for z/OS 2.5 (HLE77D0).
    • Apply PH50880 for z/OS 2.4 (HLE77C0).
  3. For Db2 13 for z/OS, apply PH51892. Follow the instructions for DDL migration outlined in the ++ HOLD text. By default, the new Db2 subsystem parameter MXAIDTCACH is set to 0, indicating that vector prefetch is disabled. To enable vector prefetch, set MXAIDTCACH to a value between 1 and 512. This parameter is online changeable. See “IBM Db2 13 for z/OS documentation” on MXAIDTCACH.
  4. For SQL Data Insights 1.1.0 UI and model training (HDBDD18), apply PH51052.

Further, the table SYSAIMODELS got a new column MODEL_CODE_LEVEL and an increase in size for the METRIC column to 500K with the above APARs.

SELECT * FROM  
SYSAIDB.SYSAIMODELS ; 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------
               MODEL_ID  NAME                                            OBJECT_ID         CONFIGURATION_ID  VECTOR_TABLE_CREATOR
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------
                     36  --------------------------------                       26                       36  DSNAIDB             

+---------+---------+--
VECTOR_TABLE_NAME      
+---------+---------+--
AIDB_IQA061QB_IQATW001

+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
VECTOR_TABLE_STATUS  VECTOR_TABLE_DBID  VECTOR_TABLE_OBID  VECTOR_TABLE_IXDBID  VECTOR_TABLE_IXOBID  VECTOR_TABLE_VERSION
+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
A                                  329                  3                  329                    4                     1

-------+---------+---------+---------+---------+---------+---------+---------+-------
METRICS                                                                              
-------+---------+---------+---------+---------+---------+---------+---------+-------
[{"discriminator":8.59443984950101,"influence":0.9367419701380996,"name":"TRANSACTION",

-------+---------+---------+
INTERPRETABILITY_STRUCT     
-------+---------+---------+

-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
CREATED_BY                        CREATED_DATE                LAST_UPDATED_BY                   LAST_UPDATED_DATE         
-------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
SQLDIID                           2023-03-01-09.51.03.777504  SQLDIID                           2023-03-01-10.43.37.796847

---+---------+---------+---------+---------+--
MODEL_ROWID                                   
---+---------+---------+---------+---------+--
2495E518C773B081E09C018000000100000000002213  



SELECT * FROM                            
SYSAIDB.SYSAICOLUMNCENTERS 
ORDER BY 1 , 2 , 3 ;
----+---------+---------+---------+---------+---------+---------+
MODEL_ID  COLUMN_NAME                          CLUSTER_MIN  LABEL
----+---------+---------+---------+---------+---------+---------+
      36  CPU_TIME                 -0.7200000000000000E+76  EMPTY
      36  CPU_TIME                 +0.0               E+00  c0   
      36  CPU_TIME                 +0.2000000000000000E+01  c1   
      36  CPU_TIME                 +0.1617671400000000E+08  c9   
      36  ELAPSE_TIME              -0.7200000000000000E+76  EMPTY
      36  ELAPSE_TIME              +0.0               E+00  c0   
      36  ELAPSE_TIME              +0.2000000000000000E+01  c1   
      36  ELAPSE_TIME              +0.1008466600000000E+08  c9   
      36  ELAPSE_TIME              +0.1074954980000000E+09  c10  

SELECT * FROM  
SYSAIDB.SYSAITRAININGJOBS ; 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------+-------
        TRAINING_JOB_ID                OBJECT_ID         CONFIGURATION_ID                 MODEL_ID  STATUS  PROGRESS RESOURCE 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--------+-------
                     33                       26                       33                       33  F              0 
                     34                       26                       34                       34  F              0 
                     35                       26                       35                       35  C            100 
                     36                       26                       36                       36  C            100 


-+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+------
MESSAGES                                                                                                                        
-+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+------
{"messages":"failed to train model: Something went wrong with the zLoad, please check the SQL DI log for more details.","resumeI
{"messages":"failed to train model: Something went wrong with the zLoad, please check the SQL DI log for more details.","resumeI
{"messages":"model training is completed","sparkSubmitId":"driver-20230224105851-0002"}                                         
{"messages":"model training is completed","sparkSubmitId":"driver-20230301085133-0003"}                                         

-+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
START_TIME                  END_TIME                    CREATED_BY                        CREATED_DATE              
-+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
2023-02-24-08.01.20.737455  2023-02-24-08.51.56.386011  SQLDIID                           2023-02-24-08.01.20.737455
2023-02-24-10.52.27.687965  2023-02-24-11.43.22.095144  SQLDIID                           2023-02-24-10.52.27.687965
2023-02-24-11.58.20.109571  2023-02-24-12.49.20.660143  SQLDIID                           2023-02-24-11.58.20.109571
2023-03-01-09.51.03.777662  2023-03-01-10.43.38.407414  SQLDIID                           2023-03-01-09.51.03.777662

---+---------+---------+---------+---------+---------+------
LAST_UPDATED_BY                   LAST_UPDATED_DATE         
---+---------+---------+---------+---------+---------+------
SQLDIID                           2023-02-24-08.51.56.386030
SQLDIID                           2023-02-24-11.43.22.095164
SQLDIID                           2023-02-24-12.49.20.660160
SQLDIID                           2023-03-01-10.43.38.407425

KPIs from my Data

Here are a few KPIs from these first test runs:

SELECT COUNT(*) FROM IQA061QB.IQATW001 ;
      64390                                                 

SELECT COUNT(*) FROM DSNAIDB.AIDB_IQA061QB_IQATW001 ;
      49173                                                 

SELECT  SUBSTR(A.COLUMN_NAME, 1, 12) AS COLUMN_NAME 
      , SUBSTR(A.VALUE      , 1, 12) AS VALUE 
      , A.VECTOR 
FROM DSNAIDB.AIDB_IQA061QB_IQATW001  A 
ORDER BY 1 , 2 ; 

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--
COLUMN_NAME   VALUE         VECTOR                                                                                                  
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--
CPU_TIME      c0            3E594822BC9D2C7A3CD4F61DBD37E5033D34B314BD4CF8E3BD4B4D47BCB6CE293D1DBA1A3D858FDF3DC4DF08BD9E77753CCED43F
CPU_TIME      c1            3D9214383CFE4C90BDB3DFE4BBE407563BBA69553DB48FEFBCF39451BC6BABF0BDA31BDFBDB52F883C30B992BC8D71AF3D9E54FF
ELAPSE_TIME   c0            3E55B744BCCC5CED3D129B14BC9E553C3C9B121EBD8949C0BD4F838DBD1582A33D36D6363DA1F72F3DBCB033BDAFB88F3D4DE348
ELAPSE_TIME   c1            3DE390AC3D2DCC98BD2DF437BC5B7F713D766D103BD1AC10BB48E2C43B9FA9E6BD80D5D7BDC40AFE3CE586C9BCACADE93DFE2745
END_USERID    BOXWEL2       3D505075BD80E40F3D3AAB60BBA463F6BBCC51C43D92B118BD044D20BD8C6B3B3CC315133BBB087A3DC1D5923DC4EB763D039C8B
END_USERID    BOXWEL3       3D2FB919BC5013E3BD6652DDBD4654DA3DA4AC83BA70024FBD7FAFD0BCF16670BB2CCB4B3DBE32E93DFE13383CB052283C82FD46

As I mentioned last month the vector tables are very “special”!

What now?

So now we have analyzed a bunch of SQL WorkLoadExpert data from our own labs. What can we do?

First up, I wish to see what user KKKKKKK does with dynamic SQL that is “similar” to what I do with table IQATW001 but I am only interested in those SQLs where the AI thinks it is more than 0.5 (so very analogous):

SELECT AI_ANALOGY('BOXWEL3'  USING MODEL COLUMN PRIM_AUTHOR,  
                  'IQATW001' USING MODEL COLUMN REF_TABLE  ,  
                  'KKKKKKK'  USING MODEL COLUMN PRIM_AUTHOR,  
                  REF_TABLE ) AS AI_VALUE                     
      ,A.WLX_TIMESTAMP                                        
      ,A.STMT_ID                                              
      ,A.STMT_TIMESTAMP                                       
      ,SUBSTR(A.PRIM_AUTHOR , 1 , 8 ) AS PRIM_AUTHOR          
      ,SUBSTR(A.PROGRAM     , 1 , 8 ) AS PROGRAM              
      ,SUBSTR(A.REF_TABLE   , 1 , 18) AS REF_TABLE            
      ,A.EXECUTIONS                                           
      ,A.GETP_OPERATIONS                                      
      ,A.ELAPSE_TIME                                          
      ,A.CPU_TIME                                             
      ,A.STMT_TEXT                                            
FROM IQA061QB.IQATW001 A                                      
WHERE A.PRIM_AUTHOR = 'KKKKKKK'                               
  AND  AI_ANALOGY('BOXWEL3'    USING MODEL COLUMN PRIM_AUTHOR,
                  'IQATW001'   USING MODEL COLUMN REF_TABLE  ,
                  'KKKKKKK'    USING MODEL COLUMN PRIM_AUTHOR,
                  REF_TABLE )                                 
       > 0.5                                                  
ORDER BY 1 DESC -- SHOW BEST FIRST                            
--ORDER BY 1 -- SHOW WORST FIRST                              
FETCH FIRST 2000 ROWS ONLY ;                                  

And the results:

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------
               AI_VALUE  WLX_TIMESTAMP                               STMT_ID  STMT_TIMESTAMP              PRIM_AUTHOR
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------
+0.7875136583708362E+00  2022-12-16-11.43.46.365129                    54801  2023-01-05-14.23.34.706136  KKKKKKK    
+0.7875136583708362E+00  2022-12-16-11.43.46.365129                    54800  2023-01-05-14.23.34.704234  KKKKKKK    
+0.7875136583708362E+00  2022-12-16-11.43.46.365129                    43654  2023-01-05-08.09.18.888198  KKKKKKK    
+0.7875136583708362E+00  2022-12-16-11.43.46.365129                    43653  2023-01-05-08.09.18.832308  KKKKKKK    
+0.7754887840772942E+00  2022-12-05-11.43.10.984618                     2616  2022-12-05-10.15.20.884139  KKKKKKK    
+0.7754887840772942E+00  2022-12-05-11.43.10.984618                     2609  2022-12-05-10.15.20.487031  KKKKKKK    
+0.7754887840772942E+00  2022-12-05-11.43.10.984618                     2617  2022-12-05-10.15.20.905752  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    37239  2023-01-04-11.07.49.907438  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    37230  2023-01-04-11.07.49.457189  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    37237  2023-01-04-11.07.49.879457  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    37238  2023-01-04-11.07.49.903769  KKKKKKK    
+0.7754887840772942E+00  2022-12-05-11.43.10.984618                     2618  2022-12-05-10.15.20.909552  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    45396  2023-01-05-08.17.11.633089  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    45389  2023-01-05-08.17.11.311055  KKKKKKK    
+0.7754887840772942E+00  2022-12-16-11.43.46.365129                    45397  2023-01-05-08.17.11.655514  KKKKKKK    

+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--
PROGRAM   REF_TABLE                        EXECUTIONS          GETP_OPERATIONS              ELAPSE_TIME
+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--
SEDBTIAA  R510T002                                  1                        0                        8
SEDBTIAA  R510T002                                  1                        0                        9
SEDBTIAA  R510T002                                  1                        0                       11
SEDBTIAA  R510T002                                  1                        3                       61
SEDBTIAA  IQATA001                                  0                        0                        0
SEDBTIAA  IQATA001                                  0                        0                        0
SEDBTIAA  IQATA001                                  0                        0                        0
SEDBTIAA  IQATA001                                  1                        4                       32
SEDBTIAA  IQATA001                                  1                        2                      111
SEDBTIAA  IQATA001                                  1                       12                    20749
SEDBTIAA  IQATA001                                  1                        4                       36
SEDBTIAA  IQATA001                                  0                        0                        0
SEDBTIAA  IQATA001                                  1                       12                    18571

All interesting stuff! I use dynamic SQL to INSERT into the table a lot, and it has determined that use of dynamic SQL with tables R510T002 and IQATA001 is analogous. In fact, it is! The SQLs were all INSERT, DELETE and UPDATE… Clever ol’ AI!

Dynamic Duo?

Now I wish to see which programs process dynamic SQL like the IBM DSNTIAD and DSNTIAP programs:

SELECT AI_SEMANTIC_CLUSTER( PROGRAM,             
                           'DSNTIAD',            
                           'DSNTIAP') AS AI_VALUE
      ,A.WLX_TIMESTAMP                           
      ,A.STMT_ID                                 
      ,A.STMT_TIMESTAMP                          
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM      
      ,A.EXECUTIONS                              
      ,A.GETP_OPERATIONS                         
      ,A.ELAPSE_TIME                             
      ,A.CPU_TIME                                
      ,A.STMT_TEXT                               
FROM IQA061QB.IQATW001 A                         
WHERE A.PROGRAM NOT IN ('DSNTIAD', 'DSNTIAP')    
  AND A.STMT_ORIGIN = 'D'                        
ORDER BY 1 DESC -- SHOW BEST FIRST                      
--ORDER BY 1 -- SHOW WORST FIRST                            
FETCH FIRST 10 ROWS ONLY ;                       

And the results:

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
               AI_VALUE  WLX_TIMESTAMP                               STMT_ID  STMT_TIMESTAMP              PROGRAM 
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
+0.7104441523551941E+00  2023-01-06-05.27.28.779825                      824  2023-01-17-09.13.48.940697  DSN§EP2L
+0.5050856471061707E+00  2023-01-06-05.27.28.779825                      559  2023-01-12-13.04.44.032345  O2DB81  
+0.5032740235328674E+00  2023-01-06-05.27.28.779825                      561  2023-01-12-13.04.44.075465  O2DB84  
+0.5007491707801819E+00  2023-01-06-05.27.28.779825                      560  2023-01-12-13.04.44.062180  O2DB82  
+0.4917877912521362E+00  2023-01-06-05.27.28.779825                      558  2023-01-12-13.04.43.986695  O2DB80  
+0.4652681946754456E+00  2023-01-06-05.27.28.779825                      562  2023-01-12-13.04.44.105563  O2DB85  
+0.4551711678504944E+00  2023-02-02-10.45.26.535375                        8  2023-01-17-16.51.06.825629  O2DB8X  
+0.4551711678504944E+00  2023-01-06-05.27.28.779825                        8  2023-01-17-16.51.06.825629  O2DB8X  
+0.4551711678504944E+00  2023-01-06-05.27.28.779825                      557  2023-01-12-13.04.40.846826  O2DB8X  
+0.4452087283134460E+00  2023-01-06-05.27.28.779825                        7  2023-01-17-16.50.54.118774  O2DB6X  

+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-
EXECUTIONS          GETP_OPERATIONS              ELAPSE_TIME                 CPU_TIME  STMT_TEXT                                    
+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-
         1                        0                        0                        0  INSERT INTO KKKKKKK.SAXDBT (SELECT A.NAME, 'B
         1                        0                       29                       29  SELECT COUNT(*),COALESCE(SUM(CASE EXCEPTION_C
         1                        0                       28                       24  SELECT COUNT(*),COALESCE(SUM(CASE EXCEPTION_C
         1                        0                       22                       22  SELECT COUNT(*),COALESCE(SUM(CASE EXCEPTION_C
         1                        0                       76                       43  SELECT COUNT(*),COALESCE(SUM(CASE EXCEPTION_C
         1                        0                       44                       24  SELECT COUNT(*),COALESCE(SUM(CASE EXCEPTION_C
         0                        0                        0                        0  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C
         1                      215                   132887                    13462  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C
         1                       18                     1035                      791  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C
         2                        7                    27753                     1236  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C

Again, very nice – it spotted all of the RealTime DBAExpert Dynamic SQL access programs in use…

Undynamic Duo?

Ok, now the opposite of that query, show me the SQLs that are like them but not them!

SELECT AI_SEMANTIC_CLUSTER( PROGRAM,                      
                           'DSNTIAD',                     
                           'IQADBACP',                    
                           'SEDBTIAA') AS AI_VALUE        
      ,A.WLX_TIMESTAMP                                    
      ,A.STMT_ID                                          
      ,A.STMT_TIMESTAMP                                   
      ,SUBSTR(A.PRIM_AUTHOR , 1 , 8) AS PRIM_AUTHOR       
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM               
      ,SUBSTR(A.REF_TABLE , 1 , 18) AS REF_TABLE          
      ,A.EXECUTIONS                                       
      ,A.GETP_OPERATIONS                                  
      ,A.ELAPSE_TIME                                      
      ,A.CPU_TIME                                         
      ,A.STMT_TEXT                                        
FROM IQA061QB.IQATW001 A                                  
WHERE A.PROGRAM NOT IN ('DSNTIAD', 'IQADBACP' ,'SEDBTIAA')
--AND A.STMT_ORIGIN = 'D'                                 
--ORDER BY 1 DESC -- SHOW BEST FIRST                      
ORDER BY 1 -- SHOW WORST FIRST                            
FETCH FIRST 10 ROWS ONLY ;                                

and the output:

+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------
STMT_ID  STMT_TIMESTAMP              PRIM_AUTHOR  PROGRAM   REF_TABLE                        EXECUTIONS          GETP_OPERATIONS
+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------
  39974  2023-01-11-09.15.29.336208               DSMSUMA   SYSVOLUMES                                3                        9
  39973  2023-01-11-09.15.29.336151               DSMSUMA   SYSDATABASE                               3                        6
  39976  2023-01-11-09.15.29.363187               DSMSUMA   SYSTABLEPART                              3                      408
  39975  2023-01-11-09.15.29.343490               DSMSUMA   SYSTABLES                                 3                      870
  39981  2023-01-11-09.15.29.459879               DSMSUMA   SYSPLAN                                   3                        9
  39980  2023-01-11-09.15.29.449311               DSMSUMA   SYSTABLESPACE                             3                      399
  39979  2023-01-11-09.15.29.381912               DSMSUMA   SYSINDEXES                                3                     1970
  39978  2023-01-11-09.15.29.370762               DSMSUMA   SYSINDEXES                                3                      786
  39977  2023-01-11-09.15.29.369892               DSMSUMA   SYSINDEXPART                              3                      381
  39972  2023-01-11-09.15.29.336020               DSMSUMA   SYSSTOGROUP                               3                        6

Aha! It found a little assembler program that fires off SQL like the top three!

The Apple doesn’t Fall far from the Tree

Finally, I want to see which programs behave like IQADBACP (our main dynamic SQL driver program):

SELECT AI_SIMILARITY( PROGRAM,              
                     'IQADBACP') AS AI_VALUE
      ,A.WLX_TIMESTAMP                      
      ,A.STMT_ID                            
      ,A.STMT_TIMESTAMP                     
      ,SUBSTR(A.PROGRAM , 1 , 8) AS PROGRAM 
      ,A.EXECUTIONS                         
      ,A.GETP_OPERATIONS                    
      ,A.ELAPSE_TIME                        
      ,A.CPU_TIME                           
      ,A.STMT_TEXT                          
FROM IQA061QB.IQATW001 A                    
WHERE NOT A.PROGRAM = 'IQADBACP'            
  AND A.STMT_ORIGIN = 'D'                   
ORDER BY 1 DESC -- SHOW BEST FIRST                      
--ORDER BY 1 -- SHOW WORST FIRST                            
FETCH FIRST 10 ROWS ONLY;                   

And the output:

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---
               AI_VALUE  WLX_TIMESTAMP                               STMT_ID  STMT_TIMESTAMP              PROGRAM
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---
+0.4575602412223816E+00  2023-02-02-10.45.26.535375                        7  2023-01-17-16.50.54.118774  O2DB6X 
+0.4575602412223816E+00  2023-01-06-05.27.28.779825                        7  2023-01-17-16.50.54.118774  O2DB6X 
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      220  2023-01-20-10.11.14.618038  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      222  2023-01-20-10.11.38.136712  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      221  2023-01-20-10.11.21.993833  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      252  2023-01-20-10.55.07.078652  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      251  2023-01-20-10.54.37.901247  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      233  2023-01-20-10.47.23.961076  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      232  2023-01-20-10.46.59.756430  DSMDSLC
+0.4400676488876343E+00  2023-01-06-05.27.28.779825                      224  2023-01-20-10.33.42.609175  DSMDSLC

+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-
EXECUTIONS          GETP_OPERATIONS              ELAPSE_TIME                 CPU_TIME  STMT_TEXT                                    
+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-
         0                        0                        0                        0  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C
         2                        7                    27753                     1236  SELECT COALESCE(COALESCE(A.DBNAME,B.DBNAME),C
         2                     1387                    57974                    14900  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         6                     4170                    68943                    53330  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         6                     4596                   286233                    99773  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         1                      851                    55367                    42542  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         1                      298                   122961                    24848  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         2                     1260                    68272                    48952  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         1                      192                     3395                     2508  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000
         3                      810                    43520                    23771  SELECT CASE WHEN B.VCATNAME < ' ' THEN '00000

Again, it found all of the correct programs.

Quibble Time!

I did find some small problems…

I use ALIASes a lot and they appear in the drop-down selection box when in “Add object”, but if you choose one as an AI Object:

This then leads on to the second quibble… The red windowed error messages stay there until you click them away… This can lead you to believe that a problem exists when in reality everything is groovy!

I also found out that the spinning wheel completes if you wait for Spark and LOAD and then go back and forward on the panel.

Finally, the way you move around the product is a bit odd… sometimes you use the browser back function, sometimes you click on a “Back” button, sometimes you click on a bread crumb, sometimes there are multiple options hidden under triple vertical dots which change depending on where you are in the process.

I am sure these little UI bugs will all get ironed out very quickly!

End of Quibbles.

First Baby Steps Taken!

This little trip into the AI world is really just the tip of the iceberg. I will be doing many more AI queries over the coming months, and I hope to show all my results, either here or in another one of my Newsletters and/or at the German GUIDE in April 2023 and, hopefully, at the IDUG 2023 as well.

Any questions about AI, do not fear to ask, and when not me then ChatGPT!

TTFN

Roy Boxwell