AUTOMATED ERROR CHECKING OF BATCH JOBS WITH MPEX/3000
by Adrian Partridge, GAINSBOROUGH SOFTWARE LTD
Published by INTERACT Magazine, Apr 1995.
Checking $STDLISTs for errors and filing them for archive purposes is
an important function of the Information System department. It is also
an unloved chore. Sites with XL machines and NMSPOOLER have the
benefit of having their spool files all held as disk files. These can
be searched and printed with normal MPE commands. HP even supply you
with a JOBABORT condition for use with the SPOOLF/LIST SPF commands to
display all $STDLISTs that have JOBABORTed during execution.
Using the excellent MPEX/3000 package from VESOFT I have created a
$STDLIST Management Tool (SMT) which provides significant advantages
over a simple SPOOLF @;SELEQ=[JOBABORT=TRUE];SHOW when required. The
finished $STDLIST Management Tool is a good example of the features
and power provided by the MPEX software.
This is a list of major elements that make a comprehensive and
useful SMT.
$STDLIST SELECTION
It is important for our SMT to select just $STDLISTs from the
spoolqueue. These $STDLISTs are filtered out to make sure that they
are in a READY state till OPENed or LOCKed by a job or utility). In
this simple SMT any $STDLISTs that have been checked will be altered
to a priority of 4, therefore only $STDLISTs above this priority will
be selected.
REPEAT
..
..
..
FORFILES O@.OUT.HPSPOOL(SPOOL.FILE="$STDLIST" & SPOOL.ISREADY &
SPOOL.OUTPRI>=5)
This REPEAT...FORFILES is an MPEX construct gives us the ability to
repeat a number of commands on files that match the FORFILES selection
condition.
There are several other ways that you might wish to do this procedure;
another possible example is
Have a logon UDC for batch jobs, which does a SPOOLF
@;SELEQ=[FILEDES=$STDLIST]; PRI=1. This will defer the $STDLIST down
to a priority of 1 so the SMT need only check $STDLISTs at this
priority. The $STDLIST priority can then be increased once checked,
this could even cause the $STDLISTs to automatically print if the
priority is above the OUTFENCE. Because $STDLISTs are deferred down to
1 there is no way that they can be accidentally printed if the
outfence is below 8.
Whatever choice you make to select the $STDLISTs, the important thing
is that $STDLISTs are only checked once.
$STDLIST DIAGNOSIS
A batch management tool must be able to tell if a job has completed or
not. Thi s SMT uses a number of basic principles to deduce the status
of the job. JOBABOR Ted $STDLISTs have definitely failed to complete
and need immediate attention. The JOBABORT condition has one major
downfall, if the command proceeding the line with the error is a
CONTINUE statement then JOBABORT will not pick this up. If an error
occurred while this is set, then JOBABORT is rendered useless.
A $STDLIST in our SMT is treated as if it has finished in one of three
states:
The $STDLIST did not complete. This could be caused by the job being
ABORTJOBed or an error by one of the commands within a JCL caused a
flush during execution. If the job does complete then there should not
be a :EOJ in the $STDLIST (its a lways a good idea to end your JCLs in
!EOJ). This can be easily searched for by passing the following line
into the SMTERROR file:
UPS(R[1:4]) <> ':EOJ' AND RECNUM = VEFINFO(FNUM).EOF - 2
This line instructs SMT to deem any file without :EOJ in the second
from last line, as terminated in error.
The $STDLIST contains an error. If MPE commands cause an error there
is a good c hance that either a CIERR or FSERR will be returned, but
not all commands. For e xample, if you did a STORE of files and some
where not stored correctly, you would want SMT to point out that the
STORE was not 100% successful. The following errors will cater for the
bulk of MPE failures.
'(CIERR' or '(FSERR' or 'NOT STORED'
Your custom errors can be added into the SMTERROR file also.
$STDLIST completed successfully - If the SMT completes the 2
procedures above then the $STDLIST is deemed as OK.
This SMTERROR file is the file that contains the errors and is read in
each tim e a $STDLIST is checked. It is advised that caution be used
in not having an excessive number of strings in this file. Remember
strings can be ORed together to reduce checking time.
Any strings that you wish to look for MUST be enclosed in ' ' symbols.
Other PR INT functions avilable for use are CL (CaseLess), DL
(DeLimited), RECNUM and vir tually any MPEX opertaor such as OR, AND,
NOT, MATCHING, BETWEEN, etc. These opt ions should be left
undelimited.
The SMTERROR file is read using the REPEAT...FORRECS construct, which
enables u se to repeat a number of commands for each line read from
the file in the FORREC S selection. Each line is passed through a
PRINT ;SEARCH= statement. If a PRINT command finds something, then
that $STDLIST is deemed as ending in error. These are immediately
reported. The PRINT variable MPEXPRINTLINESFOUND is used to accomplish
this task.
$STDLIST HANDLING
What do you do with these $STDLISTs? What special treatment do you
give to $ST DLISTs that have ended in error? How do you reduce paper
consumption by unneces sary $STDLIST printing? How do you make your
operator's time more productive instead of checking $STDLISTs all day
long? Our SMT, that's how!
Our SMT at the moment only does the most basic $STDLIST handling
because differ ent sites might want to do something slightly
different. Some possible options f or $STDLIST handling that you can
easily implement into the SMT are:
* By copying $STDLISTs selected as being in an error state to a
different name (for example, STDERROR), you can then delete $STDLISTs
when they become READY.
* Using different spool file priorities you could create a daily tier
system. The $STDLIST priorities reflect the day on which the $STDLIST
was created (5=today, 0=5 days previous). Each day you roll the
$STDLISTs down a priority, deleting ones at 0.
* Because spool files are on disk you could copy the $STDLIST to a
group created each day. This group would contain a contents file which
is written to each time a spoolfile is copied into the group. An
ON-LINE system could easily be written to pull back any contents file
and from it, pull back $STDLISTs from any number of days previous.
The SMT we run with uses a number of the above options. All $STDLISTs
are first copied to a log group created daily. OK $STDLISTs are
deleted from the queue and all $STDLISTs that are in error are copied
to a new printout called STDERR OR and the $STDLIST purged.
CONTINUAL EXAMINATION OF $STDLISTs
One of the following scenarios might apply to you:
On big sites, where batch jobs are running continually throughout the
day, $STDLISTs that have ended in error might not be found from
anything from from 10 minutes to 1 hour.
When you arrive in the morning you have the overnight batch run to
check through before any users can log on, just in case some files
not have been backed-up, or a job which updates your data hasn't run
successfully.
In both cases it is essential that $STDLISTs be checked quickly. The
easiest way of doing this is to read the $STDLISTs in a loop and
continually cycle around that loop until it is broken To keep the job
looping around and around until you wish to stop offered a few
problems at first. Many people might want to pause between checks for
up to 10 minutes. Checking if a flag file was built could only be done
once in the JCL of the SMT. This means that if a pause had just
started and you request the SMT to stop by BUILDing the stop flag it
would not finish until the pause had completed I have devised a
control mechanism that allows the user to stop the job instan tly
after a check has completed. This mechanism will also instruct the SMT
to do a check as and when it is requested. This control mechanism uses
good old message files, background task and all sorts of other
trickery:
CONTROL SKELETON -
FILE SMTMESS=SMTMESS,OLD;SHR;GMULTI
PURGE SMTMESS
BUILD SMTMESS:REC=-10,,,ASCII;DISC=1;MSG
SETVAR OPTION "CHECK"
WHILE TRUE DO
IF OPTION = "STOP" THEN
RETURN
ELSEIF OPTION = "CHECK" THEN
< $STDLIST CHECKER ROUTINES >
ENDIF
IF SONALIVE(GOONPIN) = FALSE THEN
RUN MAIN.PUB.VESOFT;PARM=1;INFO="PAUSESMT";GOON;STDLIST=$NULL;PRI=DS
SETVAR GOONPIN MPEXPIN
ENDIF
INPUT OPTION < *SMTMESS
ENDWHILE
PAUSESMT -
PAUSE 300
ECHO CHECK >*SMTMESS
Well, what does all this do? A message file is unique in that anything
reading that file will wait until something is written to it. This
principle is applied to our control skeleton above. Firstly a message
file is created called SMTMESS. This file is passed instructions from
either users or the program itself. In each loop of SMT we read
SMTMESS by way of the INPUT command, which in turn will cause SMT to
wait. Before this, a background process (PAUSESMT) is started t hat
pauses for 300 seconds (5 minutes) and then writes to SMTMESS. This
will then cause SMT to continue processing. The INPUT command is also
used to set a variable (OPTION) to whatever value is written into the
message file. With this variable we can instruct SMT to either stop or
do another check of $STDLISTs in the queue. The SMTMESS file can be
written to by you, therefore a check is set up to see i f PAUSESMT is
running if it is then a new PAUSESMT is not started. This ensures that
every 5 minutes a check will begin regardless of any outside user
intervention!!!
INFORM OF SUSPECTED ERRORS
OK so this fancy SMT has found the errors in some $STDLISTs - what
now. This utility must make as much noise, flashing highlighted text
as possible to infor m the console operator that such a job has
aborted or needs checking and he/she can promptly inform the necessary
personnel
To inform the console of any errors within $STDLISTs we use TELLOPs
and the ; FORMAT command within the PRINT line. A variable called R is
the current record that PRINT is processing. Combine this with a
TELLOP and we can send the lines d irectly from the $STDLIST straight
to the console. This gives virtually instanta neous knowledge to the
jobs creator of why it aborted without having to print out the
$STDLIST. This is done by outputting the current record (R) with
TELLOP in front of it to a file and then executing that file:
BUILD TELERROR;TEMP;REC=-60,1,F,ASCII;NOCCTL
FILE TELERROR=TELERROR,OLDTEMP
PRINT <FILE AND CONDITION>;FORMAT="TELLOP "+R[1:50];OUT=*TELERROR
XEQ TELERROR
Only the first 50 chars of the $STDLIST are copied across because then
lines wi ll not wrap around when the message appears on the console.
If you don't wish errors to do do the console but some dedicated
device just fo r SMT messages, then the TELLOP command could be
substituted with the WARN or TE LL command.
PRINT <FILE AND CONDITION>;FORMAT="WARN LDEV=nnn " + R[1:50];OUT=*TELERROR
whereby 'nnn' is the device number for the messages to be WARNed to.
An alternate idea is to have errors not only sent to the console but
also your PAGER or BEEPER!! This SMT could then be used to inform of
errors 24 hours a day.
FILE TOMOD;DEV=nnn
ECHO ATDTttt >>*TOMOD
whereby 'nnn' is the device number of the MODEM and 'ttt' is you PAGER
or BEEPER number!
To save time the SMT does not want to search each $STDLIST for every
different entry in the SMTERROR file. When an error is detected we
want to inform the console and then continue with the next $STDLIST. A
TRAPERROR...IFERROR/ENDIFERROR is another unique MPEX construct that
will cat ch errors caused within a command file or JCL and then
execute commands between the IFERROR/ENDIFERROR construct to rectify
the problem. The TRAPERROR...IFERROR /ENDIFERROR is used not just to
trap errors in our SMT, but also as a means of u sing the ESCAPE
command. This ESCAPE command assigns the CIERROR variable to a number
and forces a jumps from the TRAPERROR routine to the
IFERROR/ENDIFERROR subroutine. This feature is used in MPEX as the
equivalent of the GOTO command used within languages such as BASIC
Whenever an error is found, instead of proceeding with the next string
we jump from that routine to another.
GOTO CONSTRUCT -
REPEAT
TRAPERROR
REPEAT
PRINT $STDLIST
IF ERROR THEN
ESCAPE 1
ENDIF
FORRECS RECORD=SMTCHECK,OLD
TELLOP $STDLIST OK
IFERROR
IF CIERROR = 1 THEN
TELLOP ERROR FOUND!!!
ENDIF
ENDIFERROR
FORFILES $STDLIST
The complete SMT job stream looks like:-
!JOB SMT,MANAGER.SYS
!SETVAR VESOFTDEFAULTNOSPACE 1
!RUN MAIN.PUB.VESOFT;PARM=1;INFO="SMTCHECK"
!EOJ
The complete SMTCHECK routine looks like:-
FILE SMTMESS=SMTMESS,OLD;SHR;GMULTI
CONTINUE
PURGE SMTMESS
BUILD SMTMESS;REC=-10,,,ASCII;DISC=1;MSG
CONTINUE
PURGE TELERROR,TEMP
BUILD TELERROR;TEMP;REC=-51,,,ASCII
FILE TELERROR = TELERROR,OLDTEMP
SETVAR OPTION "CHECK"
SETVAR GOONPIN 0
WHILE TRUE DO
IF OPTION = "STOP" THEN
RETURN
ELSEIF OPTION = "CHECK" THEN
COMMENT *** START OF CHECK SECTION ***
REPEAT
TRAPERROR
REPEAT
ECHO ![MPEXCURRENTFILE]
PRINT ![MPEXCURRENTFILE];SEARCH=!RECORD;CONTEXT=2,2;&
FORMAT=" TELLOP "+R[1:50];OUT=*TELERROR
IF MPEXPRINTLINESFOUND > 0 THEN
ECHO ![SPOOL.JOBNUMBER] - FAILED
ESCAPE 1
ENDIF
FORRECS RECORD = SMTERROR,OLD
COMMENT *** $STDLIST IS OK ***
TELLOP ![SPOOL.JOBNUMBER] - ![SPOOL.JOBNAME]&
,![SPOOL.USER].![SPOOL.ACCOUNT]
TELLOP $STDLIST (![SPOOL.SPOOLFILENUM]) IS OK
TELLOP
ECHO ![SPOOL.JOBNUMBER] - OK
IFERROR
IF CIERROR = 1 THEN
TELLOP *********************
TELLOP * A T T E N T I O N *
TELLOP *********************
TELLOP
TELLOP ![SPOOL.JOBNUMBER] - ![SPOOL.JOBNAME]&
,![SPOOL.USER].![SPOOL.ACCOUNT]
TELLOP MAY CONTAIN A POSSIBLE ERROR
TELLOP
XEQ TELERROR
TELLOP
TELLOP PLEASE CHECK THIS $STDLIST IMMEDIATELY
ENDIF
ENDIFERROR
COMMENT *** ALTER $STDLISTs TO PRI OF 4 ***
SPOOLF ![SPOOL.SPOOLFILENUM];PRI=4
FORFILES O@.OUT.HPSPOOL(SPOOL.FILE="$STDLIST" AND &
SPOOL.ISREADY AND SPOOL.OUTPRI >= 5)
ENDIF
COMMENT *** START BACKGROUND PAUSE ***
IF SONALIVE(GOONPIN) = FALSE THEN
RUN MAIN.PUB.VESOFT;PARM=1;INFO="PAUSESMT";STDLIST=$NULL;GOON
SETVAR GOONPIN MPEXPIN
ENDIF
COMMENT *** START WAIT ***
INPUT OPTION < *SMTMESS
SETVAR OPTION RTRIM(OPTION)
ENDWHILE
The PAUSESMT routine is simply :-
PAUSE 300
ECHO CHECK >*SMTMESS
To shutdown the SMT job you need a little command file that I have
called TOSMT This little command file that can interface directly
with the SMT job and instruct it to either STOP or CHECK.
PARM SMTMODE
FILE SMTMESS=SMTMESS,OLD:SHR:GMULTI
ECHO !SMTMODE > *SMTMESS
With this command file typing TOSMT STOP will shutdown the SMT job
almost immed iately. Doing a TOSMT CHECK will instruct SMT to do a
check of available $STDLISTs now.
Some essential lines for the SMTERROR file are:
UPS(R[1:4]) <> ':EOJ' AND RECNUM = VEFINFO(FNUM).EOF - 2 '(FSERR' or
'(CIERR' or 'NOT STORED'
The above is a good example of the power and flexibility of the
MPEX/3000 softwa re from VESOFT. A further enhancement to the package
which we have developed is to allow viewing and printing of $STDLISTs
for all jobs which have run in th e last 30 days. This is a function
key driven online system, again written entirely in MPEX/3000 If you
require any assistance in implementing any of the above at your site,
ple ase call me (Adrian Partridge) at VESOFT in the UK, on UK -
121-352-0707.
Go to Adager's index of technical papers