2017-08 Anomalies detected in MERGECOPY and MODIFY Recovery

Db2 z/OS Utilities: Identify and avoid some MERGECOPY or MODIFY Recovery anomalies

Just a little note this month about a strange anomaly in the way MERGECOPY and MODIFY RECOVERY seem to work. Let’s begin with MERGECOPY.

1 – MERGECOPY to the death


How many IICs to change a lightbulb?

I was doing some tests, to see how many Incremental Image Copies you can take before RECOVER dies, about 71 by the way, and found this out…

SCRATCHing my head

I did a Full Image Copy, then various updates, and three Incremental Image Copies. Due to bad luck, my Full Image Copy was “accidentally” scratched… Whoops!

MERGECOPY runs

I ran the MERGECOPY and look what happened:

DSNU000I    199 10:37:11.20 DSNUGUTC - OUTPUT START FOR UTILITY, UTILID = DC10MC00MCU012
DSNU1044I   199 10:37:11.21 DSNUGTIS - PROCESSING SYSIN AS EBCDIC
DSNU050I    199 10:37:11.23 DSNUGUTC - MERGECOPY TABLESPACE R510D0DC.R510S81 NEWCOPY YES
DSNU463I    199 10:37:11.30 DSNUYBR3 - THE PRIMARY IMAGE COPY DATA SET SETEMP.R510D0DC.R510S81.P0000.D17195.T1734 WITH
DATE=170714 AND TIME=174140
            IS PARTICIPATING IN MERGECOPY.
DSNU463I    199 10:37:11.37 DSNUYBR3 - THE PRIMARY IMAGE COPY DATA SET SETEMP.R510D0DC.R510S81.P0000.D17198.T1025 WITH
DATE=170717 AND TIME=102429
            IS PARTICIPATING IN MERGECOPY.
DSNU463I    199 10:37:11.43 DSNUYBR3 - THE PRIMARY IMAGE COPY DATA SET SETEMP.R510D0DC.R510S81.P0000.D17198.T1235 WITH
DATE=170717 AND TIME=123501
            IS PARTICIPATING IN MERGECOPY.                                                                   
DSNU463I    199 10:37:11.48 DSNUYBR3 - THE PRIMARY IMAGE COPY DATA SET SETEMP.R510D0DC.R510S81.P0000.D17111.A10756 WITH
DATE=170421 AND TIME=075641
            IS PARTICIPATING IN MERGECOPY.
DSNU030I    199 10:37:11.50 DSNUYBR3 - UNABLE TO ALLOCATE SETEMP.R510D0DC.R510S81.P0000.D17111.A10756, RC=4,
CODE=X'17080002' 
DSNU454I    199 10:37:11.63 DSNUYBR0 - COPY MERGE COMPLETE 
            NUMBER OF COPIES=4        
            NUMBER OF COPIES MERGED=3   
            TOTAL NUMBER OF PAGES MERGED=363 
            ELAPSED TIME=00:00:00  
DSNU010I    199 10:37:11.67 DSNUGBAC - UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=4

Look again

Did you see what went wrong? No, nor did I the first time! Review again…

When a Return Code = 04 is actually important

Now do you see it?

DSNU030I    199 10:37:11.50 DSNUYBR3 - UNABLE TO ALLOCATE
SETEMP.R510D0DC.R510S81.P0000.D17111.A10756, RC=4,
CODE=X'17080002'

Hidden amongst all the other output…and “only” a RC=4 !

So? What does that have to do with me?

Well, who checks RC=04? Your data is non-recoverable, but you *think* you are green! Pretty nasty if you ask me! Db2 does get a bonus point though for actually merging the incrementals into another incremental…

 

2 – MODIFY Recovery


DSNUM ALL gone forever?

Remember the bad old days, when you *had* to take a TS level Image Copy of partitioned objects when you REORGed them?

Wasn’t that a terrible time? Terabytes of disk space pointlessly being filled with needless image copy data. Then along came Db2 11, which enabled TP level copies! Hoorah! Ok, not so hot with tapes etc. but who cares, it worked! Finally, we did not need to keep *huge* DSNUM ALL style copies lounging around on our expensive disks.

Too good to be true?

All sounds good, huh? What did we forget? I will tell you… NPSIs (The indexes formally known as NPIs) What is the problem, I hear you shout! Well, let me walk you through a normal scenario with COPY YES indexes…

Keep it Simple Stupid

Let us imagine a little tablespace with two partitions, one DPSI and two NPSIs. All of the indexes are COPY YES. Due to the fact that we now only do INLINE COPYs at the TP level, SYSCOPY gets lots of records looking like this:

SpacePartType
TP1F
IP1F
TP2F
IP2F
NPSI10F
NPSI20F
Then a day later
TP2F
IP2F
NPSI10F
NPSI20F
Then another day later
TP2F
IP2F
NPSI10F
NPSI20F

Remember that the NPSIs are copied with the DPSIs.

Time to DELETE the old data

Now, using MODIFY RECOVERY, you wish to rid yourself of the oldest data from Part 2.

A nice little

MODIFY RECOVERY TABLESPACE DB.TP DSNUM 2 RETAIN LAST(1)

Is enough. However, as it says in the documentation, this form does *not* delete NPSI data.

 

COPY PEND is not your friend!

If you use the DSNUM ALL, which *will* delete the NPSI data, then it will most probably put the entire table space into COPY PEND, which you do *not* want!

Space is the problem

So now the problem begins to appear… imagine that this has been happening for a year or more… you now have hundreds of syscopy entries *and* datasets for NPSI data that you cannot simply, or easily, MODIFY “away” anymore!

What I think is needed, is a new parameter, say “INCLUDE NPSI”, which will also get rid of NPSI data. RFE anyone??

I have no fix for this problem either, except to actually take a DSNUM 0 copy, including all indexes, and then do DSNUM ALL style MODIFY RECOVERY. Pretty messy…

 

As usual, if you have any comments or queries please feel free to drop me a line!

TTFN

 

Roy Boxwell

2017-07 Let’s make Db2 z/OS IMAGE COPY great again!

War story :

Why is Db2 z/OS IMAGE COPY very useful in the current Db2 z/OS world?

Please excuse the heading, I simply could not resist…

This month is another war story from the trenches of Production DBAs fighting the fight for 24×7 Shops around the world…
This time it was a request to recover some tables to a specific point in time that started the ball rolling…

Staged too soon? Bad updates and deletes

The developers pushed some software to production and then found out, five hours later, that there was a logical error and it was doing really bad updates and deletes in a bunch of tables that it should *not* have done. This is bad news…

Backup system to the Rescue?

This firm uses System Level Backup (SLB), but the DBA group, I kid you not, did not know this little factoid (they had alternate facts to works with), and so when the request to restore tables x, y and z to a specific Point in Time (PiT) arrived, they simply created the required RECOVER control cards and, after checking that the tables, indexes and tablespaces were not being used, submitted the RECOVER jobs. Now, the first job worked fine, but the second and the third “bought the farm” with *very* weird messages that the DBA group did not really understand:

RECOVER Messages you do not want to see – Part one

DSNU1520I   319 08:26:44.64 DSNUCBRT - THE RECOVERY BASE FOR TABLESPACE ROYDB.ROYTS DSNUM 1 IS THE SYSTEM LEVEL BACKUP WITH DATE = 20161212, TIME 041302, AND TOKEN X'FF..FF'

DSNU1522I   319 08:26:46.85 DSNUCBRT - THE DFSMSHSM CALL TO RESTORE TABLESPACE ROYDB.ROYTS DSNUM 1 FAILED WITH RC = X'0000005D' AND REASON CODE = X'00000042' SEE THE JOB LOG FOR DFSMSHSM MESSAGES INDICATING THE CAUSE OF THE ERROR

DSNU832I  )DSJP 319 08:26:44.61 DSNUCARS - INDEX ROYSCHEMA.ROYINDEX PARTITION 1 IS IN REBUILD PENDING STATE

DSNU560I  )DSJP 319 08:26:46.87 DSNUGSRX - TABLESPACE ROYDB.ROYTS PARTITION 1 IS IN RECOVER PENDING STATE

DSNU012I    319 08:26:46.88 DSNUGBAC - UTILITY EXECUTION TERMINATED, HIGHEST RETURN CODE=8

(Some names have been changed to protect the innocent!)

Naturally I have added the red and bold highlights. Now this message *really* upset everyone…Why? The tablespaces were previously all RW and OK, but now they were COPY Pending status!

Go Figure!

So what just happened? Well Db2 detected that there was actually an SLB that could be used for the base of this recovery, and so “asked” HSM to get it “back”. It failed with Return Code (RC) X’5D’ decimal 93 and Reason Code X’42’ decimal 66. I love that 42! And with the great “tip” to “Read the Job Log”. The DBA group were a “tad” unhappy about the fact that a *failed* RECOVER set the tablespace in question to COPY Pending by the way!

Where on Earth?

So the DBAs started trying to read the Job Log and could not see anything. They have a 16-way data-sharing group running on 14 separate LPARs… There is *lots* of “job log”…

Eventually under one of 14 different STCs all called “HSM” I found this info:

RECOVER Messages you do not want to see – Part two

08.26.44 S0998158  ARC1801I FAST REPLICATION DATA SET RECOVERY IS STARTING FOR DATA SET QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, AT 08:26:44 ON 2016/12/12

08.26.46 S0998158  ARC0624I PHYSICAL DATA SET COPY OF VOLUME QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001 TERMINATED PRIOR TO COMPLETION, DFSMSDSS FAILING RC = 8

08.26.46 S0998158  ARC1860I THE FOLLOWING 0001 DATA SET(S) FAILED DURING FAST REPLICATION DATA SET RECOVERY:QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, COPYPOOL=DSN$LOCDS0P$DB, DEVTYPE=DASD, VOLUME=WSPS95, ARC1166, RC=0

08.26.46 S0998158  ARC1802I FAST REPLICATION DATA SET RECOVERY HAS COMPLETED FOR DATA SET QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, AT 08:26:46 ON 2016/12/12, FUNCTION RC=0008, MAXIMUM DATA SET RC=0093

So here we see that it failed for RC=0093 (Now in decimal) but with a DFSMSDSS RC = 08

Confused? You will be…

I have not yet found what an RC = 0093 means – My best guess at the moment, is that the data had been “moved” since the SLB, and so the VOLUME swap failed due to some reason or other. The tablespace and index were unavailable and the “window of change” was closing fast…

Can I run backwards out the door?

Panic was approaching when I mentioned they could do the “backwards” LOGAPPLY (BACKOUT YES) and so with high hopes we attempted it, only to see that it sadly died a death if you have COPY Pending status – We felt pretty recursive at this point…

So that left us with one last chance and that was to tell RECOVER to ignore SLB and go directly to an earlier (RESTOREBEFORE) IC with a TORBA syntax.

Hoorah!

We had to wait awhiles but we got there! The RECOVERs all ran though clean and all was well… <phew>

Where’s the Beef?

So “What’s the point?” I hear you muttering… quite simple really:


1- Never rely on SLB to always work


2- Make sure you test it before switching your objects to COPY Pending by accident


3- Still take good old fashioned Image Copies


4- Use of BACKOUT YES can save your bacon *if* you know about it


As usual, if you have any comments or queries please feel free to drop me a line!

 

TTFN

Roy Boxwell


More about Utility Management and Space Management: See our RTDX suite of tools