2018-07 – SOUNDEX and other „cool“ features – Part five All new for Db2 12

Part five of my – everlasting – walk through of new Aggregate and Scalar functions. This time I will start with the new AGGREGATE functions introduced in Db2 12.

LISTAGG

This is mainly of interest due to the fact that it was the very first Agile Function Level feature in Db2. You must be at FL501 to use this in SQL otherwise you get a

DSNT408I SQLCODE = -4700 ATTEMPT TO USE NEW FUNCTION BEFORE FUNCTION LEVEL IS ACTIVATED

Or a

DSNT408I SQLCODE = -4743, ERROR:  ATTEMPT TO USE A FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL

What this function does, is basically a horizontal recursive join which can be sorted in a different sequence. Here’s the SQL example:

SELECT WORKDEPT,                                       
LISTAGG(LASTNAME, ', ') WITHIN GROUP(ORDER BY LASTNAME)
AS EMPLOYEES                                           
FROM SAMPLE.EMP                                        
GROUP BY WORKDEPT;

Which returns:

---------+---------+---------+---------+---------+---------+---------+-----+
WORKDEPT  EMPLOYEES                                                        
---------+---------+---------+---------+---------+---------+---------+-----+
A00       HAAS, LUCCHESI, O'CONNELL 
B01       THOMPSON 
C01       KWAN, NICHOLLS, QUINTANA    
D11       ADAMSON, BROWN, JONES, LUTZ, PIANKA, SCOUTTEN, STERN, WALKER,
          YOSHIMURA
D21       JEFFERSON, JOHNSON, MARINO, PEREZ, PULASKI, SMITH
E01       GEYER  
E11       HENDERSON, PARKER, SCHNEIDER, SETRIGHT, SMITH  
E21       GOUNOT, LEE, MEHTA, SPENSER 
DSNE610I NUMBER OF ROWS DISPLAYED IS 8

So you can see it adds the LASTNAME and a comma repeatedly for each WORKDEPT value.

Db2 Catalog Docu

This function is actually very handy for one of the things I create for our programmers here in the labs. It is always a good idea to “know the catalog” and so I created a little word document years ago with all the tablespaces, tables, indexes and index columns with sort order so that they did not have to look in three separate books to find the data. Here is the SQL for listing out the tables with indexes:

SELECT SUBSTR(A.DBNAME, 1, 8) AS DB      
      ,SUBSTR(A.NAME, 1, 8 ) AS TS       
      ,SUBSTR(B.NAME, 1, 18) AS TABLE    
      ,SUBSTR(C.NAME, 1, 8 ) AS INDEX    
      ,B.TYPE                            
      ,CASE C.UNIQUERULE                 
        WHEN 'D' THEN '-'                
        ELSE 'Y'                         
       END AS U                          
      ,D.COLSEQ                          
      ,SUBSTR(D.COLNAME, 1, 18) AS COL   
      ,CASE D.ORDERING                   
        WHEN ' ' THEN 'I'                
        WHEN 'A' THEN '-'                
        WHEN 'D' THEN 'D'                
        WHEN 'R' THEN 'R'                
       END AS ORDER                      
FROM SYSIBM.SYSTABLESPACE A              
    ,SYSIBM.SYSTABLES     B              
    ,SYSIBM.SYSINDEXES    C              
    ,SYSIBM.SYSKEYS       D              
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')  
  AND A.DBNAME  = B.DBNAME               
  AND A.NAME    = B.TSNAME               
  AND B.CREATOR = 'SYSIBM'               
  AND B.NAME    = C.TBNAME               
  AND B.CREATOR = C.TBCREATOR            
  AND C.NAME    = D.IXNAME                
  AND C.CREATOR = D.IXCREATOR            
ORDER BY DB, TS , TABLE , INDEX, D.COLSEQ ;

It returns 855 rows of data on my test Db2 12 Fl501 system:

---------+-------+-------+---+-------+------+--------+------------------+--
DB        TS      TABLE         INDEX     TYPE  U  COLSEQ  COL       ORDER
---------+-------+-------+---+-------+------+--------+------------------+--
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      1   DBID         -
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      2   SECTION      -
DSNDB01   SCT02   SCTR          DSNSCT02   T   Y      1   SCTNAME      -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      1   SPTLOCID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      2   SPTCOLID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      3   SPTNAME      -

And ending with

DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      1  XMLTBOWNER    -
DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      2  XMLTBNAME     -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX01   T   Y      1  STRINGID      -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX02   T   Y      1  STRING        -
DSNE610I NUMBER OF ROWS DISPLAYED IS 855

Then I used my amazing WORD skills to move all the COL columns after each other to get a DOC file that looks like this:

DSNDB0n.     SYSIBM.       SYSIBM.      U  INDEX FIELD
DBD01        DBDR          DSNDB01X     Y  DBID.SECTION
SCT02        SCTR          DSNSCT02     Y  SCTNAME
SPT01        SPTR          DSNSPT01     Y  SPTLOCID.SPTCOLID.SPTNAME.etc.

Which, of course, the programmers loved, as it enabled them to quickly find which columns, in which sort order, (I convert the ORDER D COL column to be in bold by the way), are available on the catalog and directory tables. I started this with DB2 V5…

Now with LISTAGG the query looks like:

SELECT SUBSTR(A.DBNAME, 1, 8)                                 
      ,SUBSTR(A.NAME, 1, 8 )                                  
      ,SUBSTR(B.NAME, 1, 18)                                  
      ,SUBSTR(C.NAME, 1, 8 )                                  
      ,B.TYPE AS T                                            
      ,CASE C.UNIQUERULE                                      
        WHEN 'D' THEN '-'                                     
        ELSE 'Y'                                              
       END AS U                                               
      ,LISTAGG( CHAR(SUBSTR(D.COLNAME, 1, 18) CONCAT           
                     CASE D.ORDERING                          
                     WHEN ' ' THEN 'I'                        
                     WHEN 'A' THEN '-'                        
                     WHEN 'D' THEN 'D'                        
                     WHEN 'R' THEN 'R'                        
                     END                                      
                    )                                         
               , ' ') WITHIN GROUP (ORDER BY D.COLSEQ ASC)    
FROM SYSIBM.SYSTABLESPACE A                                   
    ,SYSIBM.SYSTABLES     B                                   
    ,SYSIBM.SYSINDEXES    C                                   
    ,SYSIBM.SYSKEYS       D                                   
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')                       
  AND A.DBNAME  = B.DBNAME                                    
  AND A.NAME    = B.TSNAME                                    
  AND B.CREATOR = 'SYSIBM'                                    
  AND B.NAME    = C.TBNAME                                    
  AND B.CREATOR = C.TBCREATOR                                 
  AND C.NAME    = D.IXNAME
  AND C.CREATOR = D.IXCREATOR                                 
GROUP BY A.DBNAME, A.NAME, B.NAME, C.NAME, B.TYPE, C.UNIQUERULE
;

And the output:

---------+---------+-------+-----------+---------------------------+---
                                            T  U 
---------+---------+-------+-----------+---------------------------+---
DSNDB01   DBD01   DBDR            DSNDB01X  T  Y  DBID     - SECTION  -     
DSNDB01   SCT02   SCTR            DSNSCT02  T  Y  SCTNAME  -  
DSNDB01   SPT01   SPTR            DSNSPT01  T  Y  SPTLOCID - SPTCOLID - SPTNAME

And ends

DSNDB06   SYSXML  SYSXMLRELS      DSNXRX02  T  -  XMLTBOWNER - XMLTBNAME
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX01  T  Y  STRINGID   -         
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX02  T  Y  STRING     -         
DSNE610I NUMBER OF ROWS DISPLAYED IS 287

Which is a lot less data for my poor little fingers to work with!

Of course it is not perfect… the really big GOTCHA, is ORDER BY is *not* allowed!

PERCENTILES

For the next couple of examples, I will be using the table SAMPLE.EMP that contains these rows and columns of interest:

---------+---------+---------+---------+---------+-----
EMPNO   WORKDEPT       SALARY        BONUS         COMM
---------+---------+---------+---------+---------+-----
000210  D11          18270.00       400.00      1462.00
000190  D11          20450.00       400.00      1636.00
000180  D11          21340.00       500.00      1707.00
000160  D11          22250.00       400.00      1780.00
000170  D11          24680.00       500.00      1974.00
000150  D11          25280.00       500.00      2022.00
000200  D11          27740.00       600.00      2217.00
000220  D11          29840.00       600.00      2387.00
000060  D11          32250.00       600.00      2580.
DSNE610I NUMBER OF ROWS DISPLAYED IS 9

PERCENTILE_CONT

Calculates the requested percentile as a continuous value. Use this if you want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_CONT(0,50)
WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP
WHERE WORKDEPT = 'D11'
;
---------+---------+---------+---------+---------+---------

---------+---------+---------+---------+---------+---------
+0.2468000000000000E+05                    (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I require the 50th percentile, and as the data happens to have nine rows it would be the 5th row If you look at the data you will see that that is indeed the case.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input, so the function computes the value that will probably best fit

DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
---------+---------+---------+---------+---------+---------
SELECT PERCENTILE_CONT(0,90)                              
       WITHIN GROUP (ORDER BY SALARY DESC)                
FROM SAMPLE.EMP                                           
WHERE WORKDEPT = 'D11'                                    
;                                                         
---------+---------+---------+---------+---------+---------
                                                           
---------+---------+---------+---------+---------+---------
+0.2001400000000000E+05     (value between first and second rows)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

PERCENTILE_DISC

Calculates the requested percentile as a discrete value. Use this if you do not want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_DISC(0,50)             
       WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                        
---------+---------+---------+---------+--
   24680.00                 (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I still require the 50th percentile, and, as the data happens to still have nine rows, it would be the 5th row again.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input so the function returns the nearest input value.

DSNE616I STATEMENT EXECUTION WAS SUCCESSFU
---------+---------+---------+---------+--
SELECT PERCENTILE_DISC(0,90)             
       WITHIN GROUP (ORDER BY SALARY DESC)
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                       
---------+---------+---------+---------+--
   18270.00        (90th is not in the input data but this is the nearest)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect