MAKING OTHER PEOPLE'S PROGRAMS DO
WHAT THEY WERE NEVER INTENDED TO DO
by Eugene Volokh, VESOFT
Presented at 1990 INTEREX Conference, Boston, MA, USA
Published by INTERACT Magazine, Apr 1991.
THE PROBLEM
We first got the idea for the MPEX hook in 1980, when one of our
earliest users complained to us about the time it took him to get into
MPEX. Whenever he was in EDITOR and wanted to use MPEX, he'd have to
/KEEP the file, exit EDITOR, get into MPEX, do the command, exit MPEX,
re-enter EDITOR, and re-/TEXT the file -- a lot of work, especially on
his overloaded Series III (remember those?).
We were thus faced with a substantial problem. What our customer
really wanted was a change to EDITOR -- some way in which he could
execute MPEX commands directly from within EDITOR, without exiting or
re-entering, /TEXTing or /KEEPing. He wanted us to modify somebody
else's program.
Unfortunately, we did not have the EDITOR source code; however, even
if we had it and modified it to suit our needs, we'd have to repeat
this modification every time a new version of EDITOR came out (and
re-send this new version to all of our customers). Furthermore, the
same sort of feature would be needed in TDP, QUERY, etc. -- even if we
had the source code to these programs, we wouldn't want to modify them
all (and then re-modify them for each subsequent version).
Now, if we couldn't modify the source code, could we modify the object
code? Perhaps find out which locations needed to be patched to
implement this feature, much like HP sometimes sends out patches to
fix certain MPE bugs? There were several reasons why we could not do
this:
* Patching object code -- especially someone else's -- is hard.
Object code is very hard to understand, and often it's difficult
to tell if a patch you make might have an unexpected side-effect.
(I say this now, in 1990 -- in 1980, it was even harder for me to
deal with.)
* While patches can be used to delete chunks of code (by branching
around them) or to make small changes, they cannot readily be
used for additions. It's very difficult to insert code into a
segment, and even more difficult to add calls to external
procedures that the segment doesn't already call. To do this,
you'd almost have to write a "program file editor" program that
could manipulate program files, and though I know how to do it
now, I didn't know how to do it then, and don't want to do it now
even though I know how.
* Patches would have to be generated for every new version of the
patched program that comes out, and we'd have to start almost
from scratch for every such version (since the locations of the
various pieces of code in the program and in each segment are
likely to change quite radically, and the entire internal
structure of this part of the program might change).
All in all, just patching object code was dangerous and difficult.
TRAPPING PROCEDURE CALLS
Fortunately, there was an alternative to patching code directly, an
alternative that was pioneered (to the best of my knowledge) by Bob
Green of Robelle. (Even if he didn't originate it himself, he's
certainly the one from whom I adapted it.)
For space and performances reasons, the files handled by Bob's QEDIT
text editor had their own special internal format; a program (like a
compiler) that expected normal EDITOR-generated files would be quite
surprised to get a QEDIT file. But, since QEDIT aimed to be
substantially faster (as well as more powerful) than EDITOR, Bob
didn't want to have to convert the QEDIT file to EDITOR format before
each compile.
The thing that Bob took advantage of -- and I eventually did too -- is
that programs are not self-contained. All of their dealings with the
outside world -- with disc files, with the terminal, etc. -- are done
through intrinsics (or some other system SL procedures). If only we
could cause the programs to call our own procedures that would look
like the system intrinsics but actually do our own stuff, too (e.g.
pretend that QEDIT files are actually normal EDITOR files, or process
user input before the program gets it in order to possibly execute it
as an MPEX command), we could change the program's behavior without
its even noticing. (The way that we would make the programs call our
own procedures is by moving the programs into a separate group and
putting procedures with the same names as intrinsics into the group
SL.) For instance, if we could replace the READX called by a program
by our own procedure that:
* accepts exactly the same parameters as the real READX;
* calls the real READX;
* checks to see if the user's input starts with a "%" character,
and if so, passes it to MPEX to be executed as an MPEX command;
* returns exactly the same values as the real READX (including the
condition code)
then a user will be able to execute MPEX commands from that program by
just prefixing his input with a "%"; the program itself would not have
to be patched, since all of the logic will be in our SL procedure.
Note how this approach avoids the problems of object code patching:
* We don't have to read object code, since all that we need to do
is emulate the calling sequence of a well-documented procedure.
* We can easily add new functionality, since the SL procedure can
be of almost unlimited size.
* We probably won't have to worry about new versions of the
program, since no matter what changes are made to the program,
the program will probably still call READX in the same context
and for the same purpose as it did before.
Of course, there are also some limitations with this approach:
* We can only alter those aspects of the program's behavior that
are accomplished through external procedure calls -- internal
calculations and checks will often be beyond our reach. For
example, we can implement a multi-line REDO facility in EDITOR
(since we can, by intercepting terminal input calls, record all
the input that the user's given us and replace "redo" commands by
the appropriate line of input), but we can't input, say, a new
feature on a /CHANGE or /LIST command because we don't have any
access to the EDITOR work file and all of the work-file-related
tables that EDITOR keeps.
* Though we will know all the parameters of the procedure call, we
may have very limited information about its purpose -- for
instance, is this READX call intended to prompt for a command (in
which case we want to process commands prefixed by a "%"), or for
a line of text being added to the file (in which case we don't
want this).
And, there are some practical difficulties that we need to overcome to
make this approach fully workable -- more about them in the pages
ahead.
WRITING THE PROCEDURE
The best way of discussing things further, I think, is to walk through
a very simple example of "hooking" a program that you can actually do
as you read the paper. Unfortunately, it'll be of limited use (since
it was chosen for simplicity rather than utility), but it might still
be somewhat impressive -- we'll "teach" LISTDIR5 to honor MPE commands
prefixed by ":"s, so you can, for instance, say ":ALTSEC xxx" or
":NEWGROUP xxx" or something like that when prompted with the LISTDIR5
">" prompt.
The first question that we must ask is
which system intrinsic call can we intercept to get the job done?
The answer to this is quite simple -- the READX intrinsic. Our plan of
attack will be:
* Write a procedure with exactly the same calling sequence as
READX.
* Call the real READX to get the input from the user.
* Check the input to see if it starts with a ":" -- if so, execute
it as an MPE command using the COMMAND intrinsic.
* Return exactly the same results as READX would.
So, let's begin:
$CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX
BEGIN
INTEGER PROCEDURE READX (BUFFER, LEN);
VALUE LEN;
ARRAY BUFFER;
INTEGER LEN;
BEGIN
The very first thing that you notice is: this is written in SPL. What,
you say that you don't know SPL? Well, that's perfectly
understandable, but unfortunately there are two crucial things that a
hook procedure needs to be able to do that just cannot be done in some
languages:
* Accept as input virtually any kind of parameter -- word address
or byte address, by value or by reference.
* Return a condition code.
To the best of my knowledge, only SPL and PASCAL procedures can take
by-value parameters (like READX's LEN parameter); in MPE/V, only SPL
procedures can return a condition code (though MPE/XL's HPSETCCODE
intrinsic permits other languages to do this on MPE/XL).
However, with the following procedure:
PROCEDURE VESETCCODE (I << 0 = CCG, 1 = CCL, 2 = CCE >>);
VALUE I;
INTEGER I;
BEGIN
INTEGER ARRAY Q(*)=Q+0;
Q(-Q(0)-1).(6:2):=I;
END;
you can set the condition code from, say, a PASCAL procedure, as long
as you call it (VESETCCODE) from the hook procedure itself and not
from any of the procedures called from within it.
Armed with VESETCCODE, there's no reason why you can't write hook
procedures in PASCAL on MPE/V (in fact, I'll even use it in my SPL
examples) though I think that you still can't do them in any other
language.
OK, back to our sample procedure. Note that we created a procedure
header that exactly corresponds to the calling sequence of the READX
intrinsic. Each of the parameters must match exactly, both in type and
in mode (by value/by reference); the return value must be exactly the
right type, and if the procedure we're intercepting is an OPTION
VARIABLE procedure, so must ours be. (PASCAL programmers: you can
still hook OPTION VARIABLE procedures if you realize that an OPTION
VARIABLE procedure is just the same as a normal one but has an extra
by-value parameter at the end that contains the OPTION VARIABLE bit
mask.)
Now, let's continue:
$CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX
BEGIN
INTEGER PROCEDURE READX (BUFFER, LEN);
VALUE LEN;
ARRAY BUFFER;
INTEGER LEN;
BEGIN
INTRINSIC READX;
BYTE ARRAY BUFFER'B(*)=BUFFER;
INTEGER LEN'READ;
BYTE ARRAY TEMP'CMD(0:255);
INTEGER CIERR;
INTEGER FSERR;
LEN'READ:=READX (BUFFER, LEN);
IF > THEN VESETCCODE (0)
ELSE IF < THEN VESETCCODE (1)
ELSE
BEGIN
IF LEN'READ<>0 AND BUFFER'B=":" THEN
BEGIN
MOVE TEMP'CMD:=BUFFER'B(1),(LEN'READ-1);
TEMP'CMD(LEN'READ-1):=%15; << carriage return >>
COMMAND (TEMP'CMD, CIERR, FSERR);
END;
VESETCCODE (2);
END;
READX:=LEN'READ;
END;
As you see, we call READX, check the condition code, set our own
return condition code appropriately, and if the read succeeded and the
input line starts with a ":", call the COMMAND intrinsic.
What is wrong with this picture? Well, there are three problems:
* First, and most important of all (I'm sure you noticed this), we
have our procedure called READX calling the intrinsic READX.
You'd think that since you declared READX as an intrinsic, the
compiler will recognize that you want to call the READX intrinsic
in the system SL.
This, unfortunately, is not the case. When the linker sees the
call to READX, it views it as a recursive call to our own
procedure and not a call to the READX intrinsic. (To make matters
worse, the SPL compiler will not flag the "INTRINSIC READX"
declaration as a duplicate symbol error.) In fact, we will find
that this -- how to call the real procedure from our hook
procedure -- is one of the more substantial problems that we
face.
* Secondly, note the MOVE TEMP'CMD:=BUFFER'B(1),(LEN'READ-1)
statement -- why is it wrong? Because the way that the READX
intrinsic is defined, its result (which we put into LEN'READ) may
be the number of bytes or the number of words read (depending on
whether the LEN parameter was negative or positive). Actually, we
might discover that LISTDIR5 always passes a negative LEN
parameter and thus the READX result will always be in bytes, but
we don't want to count on that (especially if we want the hook
procedure to be general). The rule is thus that you must be
prepared for any possible set of input parameters and any
possible result returned by the intrinsic.
In other words, instead of LEN'READ, we should have said (IF
LEN<0 THEN LEN'READ ELSE 2*LEN'READ).
* Thirdly, what happens when a command that's prefixed by a ":" is
input? Indeed, it will be executed as an MPE command, but it will
then be returned to LISTDIR5 as the result of the read --
LISTDIR5 will see it as an invalid command, and will output a
nasty message.
This is important to remember -- when you intercept a procedure
call, from the program's point of view the call still completes,
and the program will act upon the data returned by the hook
procedure. In this case, we should make sure that the data
returned to LISTDIR5 is such that LISTDIR5 will do as little with
it as possible -- in LISTDIR5's case, returning an empty string
(just as if the user hit return). For this, we'd have to set the
function result to 0 (0 characters read), but we'd also have to
make sure that the buffer returned to the program is in the same
state as it was when our procedure was called, since programs
often calculate the length of the data input not by the READX
result, but by the position of some terminating character (e.g. a
carriage return) that they filled the buffer with.
CALLING THE REAL PROCEDURE
Let's get back for a moment to the first problem mentioned in the
above list -- if we call the intrinsic READX from our READX procedure,
we get an infinite loop. What can we do about this?
Well, there are three possible solutions:
* Since we're putting our READX in a group or account SL anyway, we
can put an "intermediary" procedure called, say, INT'READX into
the system SL (or any SL higher than the one in which our READX
is) -- our READX can call INT'READX, which will then call READX.
Since SL's are always searched in the order group, account,
system, a call to READX from an INT'READX that's in the system SL
will not call back to our group/account SL READX, but rather go
to the real READX in the system SL.
The INT'READX procedure might then look something like this:
INTEGER PROCEDURE INT'READX (BUFFER, LEN);
VALUE LEN;
ARRAY BUFFER;
INTEGER LEN;
BEGIN
INTRINSIC READX;
INT'READX:=READX (BUFFER, LEN);
IF > THEN VESETCCODE (0)
ELSE IF < THEN VESETCCODE (1)
ELSE VESETCCODE (2);
END;
* Another alternative is to have our READX call the real READX
using the LOADPROC intrinsic. Among other things, LOADPROC lets
you specify that you want to load the procedure from the system
SL, so we won't get into the same recursive loop that we would
have had if our READX just tried to call READX directly.
I won't go into any more detail as to how this is done, but I do
want to point out that one problem with this approach is where to
put the plabel returned by the LOADPROC procedure so that we
don't have to re-LOADPROC for every procedure call. Actually, for
intercepting READX we can afford to re-LOADPROC the real READX
every time our READX is called because subsequent LOADPROCs of a
procedure that's already been LOADPROCed take only a few
milliseconds; however, if we're intercepting a more time-critical
procedure, like FREAD, we'd have to be sure to LOADPROC it only
once, in which case we'd need some global storage to keep the
plabel. More about the global storage problem later.
* One other alternative that's worth considering is somewhat more
difficult to do but cures a very substantial disadvantage present
in the first two solutions.
Say that you want to hook a number of programs all over your
system, for instance to call your own DBOPEN replacement
procedure instead of the DBOPEN intrinsic (which we do in our
SECURITY/3000 VEOPEN module), or to call a replacement COMMAND
procedure (which we do in our SECURITY/3000 STREAMX module). If
you want to intercept these calls using procedures named DBOPEN
and COMMAND, you'd have to put these procedures into local group
or account SLs in every group or account in which the programs to
be hooked reside. This can prove quite cumbersome, especially
when it comes time to install a new version of your procedures --
you might have to replace dozen of SLs in dozens of different
accounts. The trouble, of course, is that you can't put your own
hook procedures into the system SL, since they have the same
names as the real intrinsics.
The way that you can get around this problem is by actually
patching all the programs to be hooked to call not a procedure
called DBOPEN, but rather one called, say, VEOPEN. Then, you can
put the VEOPEN procedure into the system SL, since it will no
longer conflict with the real DBOPEN -- furthermore, since it is
called VEOPEN, there'll be no problem with it calling the real
DBOPEN without threat of recursion. When a new version of VEOPEN,
incidentally, is installed, you won't have to re-patch all the
programs, but only replace the module in the system SL. (On the
other hand, whenever you roll in a new version of a patched
program, you'd have to re-patch it.)
Patching the programs might at first glance seem difficult, but
it actually isn't. All program files (I'm speaking here of MPE/V
and CM programs) contain at a well-defined place an "external
reference list", which is a list of the names of all the SL
procedures that they call (together with some other information).
Simply by replacing the procedure name "DBOPEN" by the name
"VEOPEN" you can make the procedure call VEOPEN instead of
DBOPEN. (Note that the two procedure names are intentionally
chosen to be the same length.) The layout of this table is
described in Chapter 10 of the MPE/V System Tables Manual -- it's
not that hard to write a program that modifies it. It's not much
more difficult (though it is extra work) to write a program that
modifies the external reference list of an SL segment.
All in all, we've found patching to work quite well for us, but
the additional cost of writing a program to patch the external
reference list might make it a rather expensive solution for
some.
GLOBAL STORAGE
So far, with the READX and INT'READX procedures, we've done pretty
much what needs to be done to get our new-and-improved LISTDIR5 to
work. All we need to do is:
* Copy LISTDIR5 into our own group (it'll have to have PM
capability, but that's just because LISTDIR5 itself needs PM).
* Add the READX procedure (as finally corrected) to the group SL.
* Add the INT'READX procedure to the account or system SL.
* :RUN LISTDIR5.ourgroup;LIB=G
If we've done everything right, our toy should work just fine; we can
even move other programs (e.g. DBUTIL) into our group and have the
very same procedure work for them.
Unfortunately, one of the reasons why this was so easy (you did think
it was easy, didn't you?) is because the problem that we set for our
ourselves was quite easy. The feature that we wanted to implement
could be implemented entirely within one READX call; we didn't need to
save any information from one call to the next.
What if we did need to save information this way? For instance, if we
wanted to implement a multi-line REDO, we'd have to save some
information (e.g. the file number of the REDO file) from one READX
call to another -- we'd also need to be able to tell when our READX is
called for the first time, so that we can initialize this information.
(Actually, a number of useful features -- like SECURITY/3000's VEOPEN
and STREAMX's interception of the COMMAND intrinsic -- can be
implemented without using global storage, but many other features
can't be.)
SL procedures in MPE/V are not allowed to have global storage of their
own -- if you try to add a procedure that uses global or OWN variables
to an SL, it will fail. Procedures that have the cooperation of the
caller (like the V/3000 intrinsics) can get around this by having the
caller pass them an array that contains the data that they need to
preserve from call to call (e.g. the V/3000 VCOMAREA array), but we
don't have that luxury since we must remain scrupulously compatible
with the calling sequences of the procedures we're intercepting.
Where can we put our global data? There are a number of places in
which MPE lets us keep information that won't vanish from one
procedure call to the next, but all of them have their own problems:
* Files and extra data segments -- you can put a lot of data here,
but it's rather slow to access (even a few milliseconds per call
can be slow when we're intercepting a frequently called procedure
like FREAD, though it can be acceptable for, say, READX, DBOPEN,
or COMMAND, which take much longer anyway). Furthermore, you
still have to find a place to keep the file number or extra data
segment index!
* JCWs -- these are also rather slow, and they can only contain a
single word each. Furthermore, they're session-local rather than
process-local, so multiple hooked processes in the same session
might have trouble.
* The "DL-to-DB" area of the stack -- this can be accessed very
quickly (since it's just as much part of your stack as your
procedure-local variables), but is often already used by the
hooked program (especially if it calls V/3000 or uses PASCAL's
"heap" mechanism). There are a few words a little bit below DB
(DB-10 through DB-1) that are often not used by most programs,
especially programs written in SPL, but again it's possible that
the program you're trying to hook uses them. This is especially
relevant if you're trying to write a general-purpose hook routine
that is supposed to work for all programs -- in fact, the first
version of our MPEX hook routine used one of these DL-negative
words until we ran into a program that wanted to use it, too.
As you see, this is not a pretty sight. There are things you can do
with one or more above mechanisms that might work in your case
(especially if speed is not a problem), but there doesn't seem to be a
very good general solution.
The best solution (pioneered by Bob Green) is somewhat difficult to
implement but ultimately far superior to any of the above. As we
mentioned before, Bob's QEDIT text editor was written with efficiency
very much in mind, and when he decided to have compilers read QEDIT
files, it was very important that they do this as fast as possible.
One of the key procedures that he needed to intercept was the FREAD
intrinsic (which often takes only a couple of milliseconds), so the
access to his global storage had to be as fast as possible. He pretty
much had to have all the global storage be kept in the stack.
To understand how this approach works, one has to realize what a
program file contains. A program file is essentially a blueprint for
the loader that describes how to load the process. It contains
information on all the code segments (which is to go into the CSTX),
the names of all the external references (which are to be loaded from
SLs), and an image of what the process's stack is to look like when it
starts up. All the initial values of all of the program's global
variables are kept here, and when the loader loads the program, it
allocates the right amount of global area (the size of the global area
is also kept in the program file) and fills it with these initial
values.
Bob was already patching the program file's external reference list
(see the discussion above), so he decided to expand the program's
initial global values area to include room for his own global values.
Since the program by definition didn't use any of the global area
beyond what it thought was available, his storage wouldn't collide
with the program's storage; and, he could add as much space as he
needed (keeping in mind, however, that if the program already used a
lot of stack space, this might cause stack overflow problems).
So the the general plan -- again, you might want to look at Chapter 10
of your System Tables Manual for this -- was:
* Modify the "global area size" word in record 0 to indicate that
there is more global area.
* Insert as many records as needed after the global area (and
before the code segment area) in the program file, initializing
them to whatever values you wanted to initialize them to. The
insertion, of course, has to be done by creating a new copy of
the program and copying all the data from the old program.
* Modify the record numbers (in record 0) of the segment area,
external reference list area, and entry list area to reflect the
fact that we've inserted records.
* Set the record number of the FPMAP area to 0, since unfortunately
the FPMAP area of the program contains a lot of internal pointers
with record numbers, and rather than readjusting them all, you
should probably just tell the system that there is no FPMAP in
this program.
For example, if the old program used 5000 words of stack space and you
wanted to have 256 words of your own, you'd change the global area
size to 5256, insert 2 records (256 words, 128 words per record) at
the end of the global area, and increment the segment area, external
reference list area, and entry list area record numbers by 2. It seems
like a fairly complicated manipulation, but it really isn't; armed
with Chapter 10 of the System Tables Reference Manual, you can do it
quite easily.
There is one more problem to be dealt with. You run the patched
program and it has 256 extra words of global area; but how does your
hook procedure know where those words are? You can't just hardcode the
address into the procedure, since you'd like it to work for various
programs (and in any event the end of the global area of even one
program will probably change from version to version). Instead, here's
what you can do:
* When you insert your global area, make sure that the part that
you want to use starts on, say, a 128-word boundary. For
instance, in our 5000-word-global-area program, you'd make sure
that your 256-word global area starts at word #5120 (the first
multiple of 128 after 5000) -- thus, you'd expand the global area
to 5376 words and just waste the 120 words between words 5000 and
5119.
* Set the first few words of the non-waste part of the data you
insert into the program to some unique pattern that's not likely
to appear in a normal program's global area. (Remember that the
data that you insert into the copy of the global area in the
program file will make its way into the program's stack.)
* In your hook procedure, try to find this unique pattern by
looking at word 0, then word 128, then word 256, etc.
This way, you find your global area by the unique pattern that you've
initialized its first few words to, but you don't have to check every
word in your stack (which would take too long) because you know that
your global area starts on a 128-word boundary. An example of this in
SPL might be:
INTEGER POINTER IP;
@IP:=0; << make IP point to DB+0 >>
WHILE IP(0)<>123 AND IP(1)<>456 AND IP(2)<>789 AND IP(3)<>555 DO
@IP:=@IP(128);
Note that your unique sequence must be a sequence that's highly
unlikely to ever appear in the program's stack; if, for instance, you
choose a normal piece of text, it's possible (though unlikely) that
this piece of text will somehow appear in the program's stack at a
128-word boundary (perhaps input from the terminal or a file) and will
thus make you find the wrong area. I use a fairly unlikely sequence of
5 words, many of which represent unprintable ASCII characters.
WHICH PROCEDURES TO PATCH
The preceding discussion assumed that you knew exactly which procedure
is to be patched, e.g. READX in LISTDIR5. Unfortunately, things aren't
always quite this simple.
Most tasks can be performed by a program in different ways. Some
programs, for instance, use READX to read from the terminal, but
others (like SPOOK) use READ, and others (like EDITOR) use FREAD. When
writing our "MPEX hook" procedures, we wanted to work with all of
these programs, so we needed to hook all of the procedures. Hooking
READ was quite simple, since it is very simple to READX, but dealing
with FREAD was more difficult, because it was used by EDITOR to read
both from the terminal and from files. We wanted to have terminal
input that was prefixed by "%" be executed as MPEX commands, and we
wanted to save terminal input in the multi-line REDO history, but we
obviously didn't want this done to, say, lines from the file that we
were /TEXTing in.
Our first thought was to call FGETINFO inside each execution of our
FREAD hook procedure to see if we were reading from the terminal or
not, but this was far too inefficient -- imagine calling FGETINFO for
each line of a 10,000-line long file. Instead, we found ourselves
having to hook FOPEN calls just so that we can check once per file
open whether we were opening $STDIN or $STDINX, and recording this
information for each file -- then our FREAD hook could just look into
this array of flags to see if this was a terminal file or not.
Similar problems arise when programs use other mechanisms for reading
from the terminal -- programs written in PASCAL often use PASCAL
compiler library routines to do terminal I/O; these routines can be
quite difficult to hook simply because, unlike intrinsics, their
calling sequences are undocumented.
The problem of FREADs from the terminal vs. FREADs from files is
actually a symptom of a greater problem -- what we really want to
distinguish is not terminal vs. file input, but rather input of
commands (which might come from files, e.g. /USE files) from reading
of data (which might come from the terminal, e.g. in /ADD mode). We
really want to distinguish FREAD calls based on what EDITOR intends to
do with the data read, which unfortunately we cannot do, since the
essence of the problem is that EDITOR isn't cooperating with us and
isn't telling us anything about what it's doing.
We might try to tell which FREAD call is which by looking at where in
the program the FREAD is being called (we can get this information
from our hook FREAD procedure's stack marker), and seeing if it's one
of those locations in which EDITOR does command input; unfortunately,
this leaves us with almost all the problems that would be involved in
directly patching the program's code -- we'd have to read the object
code to find all the right locations to patch, they would only apply
to this particular program, and they would have to be recalculated for
each new version of the program.
Fortunately, sometimes, you can get information as to the "purpose" of
a call in surprising places -- for instance, you may find that a
particular program reads command input by passing a read length of -80
to READX but reads /ADD-mode input by passing a read length of -255,
and thus use the read length to distinguish the two kinds of input.
To keep your hook procedure general, you might even want to keep the
expected read length as a value in the global area that the program
that you use to hook other program files inserts, and have this
"hooking program" prompt for what expected read length is to be put
into the global area. This way, you can, at the time you hook a
program, communicate various special attributes of the hooked program
to the hook procedure that will be called from this program.
HOOKING NATIVE MODE PROGRAMS
Hooking MPE/XL native mode (NM) programs is a somewhat different
story, but the essentials are still the same. Just as you might
intercept, say, FREAD calls from a CM (or MPE/V) program by putting
your own FREAD procedure into a group SL and running the program with
;LIB=G, so you can intercept FREAD calls from an NM program by putting
your own FREAD into an XL and then running the program with
;XL=yourfile. Some aspects of this are easier to do than with CM
programs, while others are a bit harder.
UPPER AND LOWER CASE
Unlike MPE/V, in which all procedure names are in uppercase, MPE/XL
lets you have upper and lower (and mixed) case procedure names; the
procedures "FREAD" and "fread" are two different procedures. (This is
necessary for supporting C, which cares about case.)
All system intrinsics are declared in uppercase, but by default
PASCAL/XL procedures are created with lowercase names. If your native
mode program calls "FREAD", and you run it with an XL= of yours that
contains your own procedure called "fread" (which is what a "PROCEDURE
FREAD" declaration will by default create), your procedure will not
get called because of the difference in names.
Fortunately, the solution is simple. Do a "$UPPERCASE ON$" before the
procedure declaration; this will tell the PASCAL/XL compiler to create
the procedure with an uppercase name.
DECLARING YOUR PARAMETERS CORRECTLY
Just as it is vital in MPE/V to emulate exactly the calling sequence
of the procedure to be intercepted, so it is equally vital in MPE/XL.
All the by value parameters must be by value, the by reference ones
must be by reference, and all the types must be identical.
However, there are quite a few more subtle issues involved as well:
* ALIGNMENT: You can tell the PASCAL/XL compiler whether a
parameter must start on a byte, half-word, or word boundary. By
default, PASCAL/XL will expect most parameters to start on a word
boundary, so if you just declare your procedure as
TYPE TARRAY = ARRAY [1..65536] OF INTEGER;
...
FUNCTION FREAD (FNUM: SHORTINT;
VAR BUFFER: TARRAY;
LEN: SHORTINT): SHORTINT;
then PASCAL/XL will emit code that assumes that BUFFER begins on
a word boundary. If the caller then calls FREAD with a BUFFER
that doesn't start on a word boundary, you'll get a run-time
error.
What can you do? You should get a listing of the system intrinsic
file by compiling the following short program:
$LISTINTR 'LISTFILE'$
PROGRAM DUMMY;
BEGIN
END.
This will send a listing of the calling sequences of all the
intrinsics in the system intrinsics file to the file LISTFILE;
you can then look up your intrinsic there (note that the file is
not in alphabetical order) and see what sort of alignment it
shows for that parameter. If it's "8-BIT ALIGNED", you should
prefix the parameter with "$ALIGNMENT 1$" (e.g. "BUFFER:
$ALIGNMENT 1$ TARRAY"); if it's "16-BIT ALIGNED", use "$ALIGNMENT
2$"; if it's "32-BIT ALIGNED", you don't need a $ALIGNMENT$
keyword.
I suspect, however, that if you declare all your by reference
parameters with "$ALIGNMENT 1$", you should have no problems;
your procedure will run a bit slower, but not by very much.
* LONG VS. SHORT POINTERS: One other problem with the FREAD calling
sequence shown above is that it declares BUFFER as just being of
type "TARRAY", i.e. being passed as a 32-bit pointer to an array
kept in the process's "short address space". Actually, if you
look at the intrinsic file listing for the FREAD intrinsic,
you'll find that BUFFER is passed as a "LONG ADDR", a 64-bit
pointer. You must declare the parameter as a 64-bit pointer,
either by saying:
TYPE TARRAY_PTR = ^ $EXTNADDR$ ARRAY [1..65536] OF INTEGER;
...
FUNCTION FREAD (... BUFFER: TARRAY_PTR; ...): SHORTINT;
or by saying
FUNCTION FREAD (... BUFFER: GLOBALANYPTR; ...): SHORTINT;
Note that, once you've declared BUFFER as a pointer rather than
as an array, you should no longer pass it as a "VAR".
* OPTION EXTENSIBLE: Some MPE/XL intrinsics which take a variable
number of parameters are declared as OPTION EXTENSIBLE. This
tells the compiler to pass a single word at the beginning of the
parameter list that contains the total number of parameters being
passed. If you're trying to intercept an OPTION EXTENSIBLE
procedure, you need to make your own procedure OPTION EXTENSIBLE,
too.
Unfortunately, it's hard to tell which procedures are OPTION
EXTENSIBLE and which are not. Some, instead of being OPTION
EXTENSIBLE, are declared with OPTION DEFAULT_PARMS; this tells
the compiler to set the values of omitted parameters to the
specified default values, which makes the parameter count word
unnecessary. You must look at the intrinsic file listing and see
whether the procedure was indeed declared with OPTION EXTENSIBLE.
If it was, however, it doesn't matter how many non-extensible
parameters it has; an "OPTION EXTENSIBLE 0" declaration is
enough. You need not compile your intercepting procedure with the
same OPTION DEFAULT_PARMS values as the intrinsic was; those only
matter when the calling program is compiled.
* ANYVARs: If a PASCAL/XL procedure is declared with ANYVAR
parameters but does not have an OPTION UNCHECKABLE_ANYVAR, then
for every such ANYVAR parameter, the size of the parameter is
passed together with its address. If an ANYVAR parameter is
passed to the procedure being intercepted, and it is not an
OPTION UNCHECKABLE_ANYVAR, then the intercepting procedure must
also be non-UNCHECKABLE_ANYVAR and must declare the parameter as
an ANYVAR. If, however, the parameter is a VAR, or is an ANYVAR
and the procedure is an OPTION UNCHECKABLE_ANYVAR, then the
intercepting procedure can declare it as a simple VAR, too.
GLOBAL STORAGE
Native mode XL routines can have global storage of their own, so you
don't need to use any of the tricks we've discussed above to save data
from one call of the procedure to the next. In particular, if your
FREAD needs to call the real FREAD, it can do an HPGETPROCPLABEL (the
NM equivalent of LOADPROC) of the real FREAD and save the plabel in a
global variable. Declaring these global variables is quite simple:
$SUBPROGRAM$
$GLOBAL$
PROGRAM DUMMY_OUTER_BLOCK;
VAR globvar: type;
...
PROCEDURE FREAD ...
One problem is that PASCAL/XL has no way of initializing global
variables to a particular value, and thus no way of checking whether
the procedure has been called before and thus some special behavior
(e.g. loading a procedure, opening a file, etc.) is required. One
trick that you can use is to check if the variable is equal to some
special constant of yours, and if it isn't, assume that this is the
first call to the procedure, do the initialization stuff, and then set
the variable to that constant. Unfortunately, it is possible that the
variable will have had that value by accident from the beginning; this
may be more likely than you might expect, since this chunk of memory
might have been used earlier by another process which was running the
same routine, and which initialized that location in memory to your
flag value.
The safest solution would probably be to write your procedure in HP
C/XL (which lets you initialize global variables), or possibly just
write one small procedure in HP C/XL that declares this variable and
lets you access it.
RUNNING THE HOOKED PROGRAM
To run the hooked program, you should simply
:RUN myprog;XL="interceptingxl"
What if you don't like to have to always specify the XL= parameter?
Too bad. Although a program can be :LINKed with a default ;XL=, it is
very hard to patch after a :LINK; also, the MPE/V technique of
changing the name of the called routine in the external reference list
and adding the procedure to the system SL doesn't work, because the
format of the external reference list is not documented and because
adding things to the system XL is much more difficult than adding them
to the SL.
The only time that having to run the program with ;XL= would be a
serious program is if the program is process-handled from another
program. Fortunately, hooking gives us a solution (albeit a difficult
one) to this problem---just intercept CREATEPROCESS (or CREATE or
COMMAND or HPCICOMMAND or whatever the program uses) and "add" an XL=
parameter to the calling sequence. (This is what we did for our
VECMMND and VEOPEN routines, which intercept COMMAND and DBOPEN
calls.)
Another possible solution, which might be easier in some cases, is to
rename your old program (say, MYPROG) to MYPROGUH ("Un-Hooked"), and
create a small shell program called MYPROG, which CREATEPROCESSes
MYPROGUH with the right XL= parameter (and possibly also passes on
whatever ;PARM= and ;INFO= values it was run with). This will cause a
bit more overhead, but this way whenever MYPROG is run, it will always
execute MYPROGUH with the correct XL=.
CONCLUSION
In part, this paper is more a discussion of an interesting type of
problem solved in an interesting way than a blueprint for your own
development -- not everyone has the needs described in this paper or
the means to satisfy these needs.
However, various people in the HP3000 community have, more or less
independently, used these techniques to accomplish some very valuable
things:
* Robelle has gotten compilers to read QEDIT-format files.
* Various people have intercepted IMAGE calls to instead go to
their own extended-IMAGE utilities.
* VESOFT has used hooking to implement MPEX command execution and
multi-line REDO from EDITOR, TDP, etc., to allow an SM user
running some such editor to save files across account boundaries,
to preserve the ACDs of files being edited, to implement an IMAGE
database security system by hooking DBOPEN, and to intercept
COMMAND intrinsic calls and to route executions of the STREAM
command through STREAMX.
There are a couple of relatively simple things that come to mind that
you might do yourself:
* If you have your own internal data storage format, you can hook
your favorite text editor to be able to properly read those
files.
* If you want to disallow people to execute certain MPE commands
from a program that normally allows MPE command execution (e.g.
EDITOR), you can hook it to reject those commands.
* If you want to implement a control-Y trap in a program, you can
hook some procedure that the program calls at the very beginning
and have your hook procedure arm the control-Y trap.
If you really want to do something substantial, I believe that you can
hook QUERY to handle MPE and KSAM files by intercepting all the DBxxx
calls to make the MPE and KSAM files look like IMAGE databases. This
would be truly a feat.