A Software Patch Problem
In Mark Halpern's first memoir in the Annals of the History of
Computing he mentioned his uneasiness when the word "patch" was deemed
no longer necessary to define. I can share his nostalgia.
When I moved to Phoenix from Paris in 1966, one of my first tasks was to
see why certain GE software was not running up to advertised speeds --
the COBOL compiler in particular. I got the assistance of Leroy
Ellison, and we started with comparison compiler runs. But it seemed to
me that there was some gold to be mined in the operational aspects, so I
went to the machine room to see COBOL compilations in action.
I noticed that the operator would get a message, necessitating the
depression of a key. It seemed as though it was in an infinite loop.
So I inquired, and found out that patches seemed to be the problem.
Apparently the programmers thought it OK to write the COBOL compiler
program, assemble it, but not do periodic reassemblies very often.
Instead they put in a lot of patches, presumably intending to take them
out upon the next assembly. Unfortunately that did not happen too
often, and at the time of my investigation there were some 12,000
patches, residing on the drum!
Each patch took 64 words (a sector) of space on the drum to define
where it went, and 64 words to contain the patch itself.
Thus 12,000 patches, each using up 128 36-bit words on the drum, which
not only caused a certain drain on drum capacity, but every time the
COBOL compiler was read in to run it had to overlay each one of these
12,000 patches in sequence before the compiler could actually compile.
Now this might not have been too bad, but unfortunately the hardware
people did not talk to the software people, and vice versa. The former
knew of a design glitch called the "1-word DCW problem". DCW stood for
Device Control Word. The hardware glitch occurred when only one word in
a sector was used. But surely, hardware reasoned, that would not happen
very often, and when it did the operator could call for a rereading
which would probably work. But guess the circumstance where there would
be only one word in a sector? Right. The pointer for the patch.
It turned out to be VERY important when there were 12,000 patches, but
the software people did not know about the problem. They hadn't been
told, and thus did not ask the hardware folk to fix it. With 12,000
patches reading in at every compilation there was a high probability of
the 1-word DCW glitch acting up.
There was an ABORT button on the console, so in such cases the operator
had the option of aborting and going on to another process, or else
trying to reread the drum. Naturally, no operator wants to foul up the
operation, and so kept hitting the RETRY button. I timed this, and
on average the RETRY button was hit 30 times before a correct read
occurred. By my count this happened an average 30 times per shift. Each
time there was also a typewriter message that said, in effect, "We could
not read this time. Want to try again?". The typewriter was not
buffered, and my timing gave 3 seconds for the message, 1 second to hit
the RETRY button.
So we had 4 wasted seconds times 30 retries times 30 per shift. That
equals 1 hour per 8-hour shift -- absolutely lost! And nobody realized
it until I actually walked onto the test floor to observe!
A Process Transfer Problem
Another gold mine was in the operating system, which is usually pretty
busy doing housekeeping when shifting attention to another portion of
the system. But software management thought that not much could be
gained here.
I didn't believe that, so I asked for a manual for the GE 600 (I had
never seen the instruction set before), and then a listing of the
actual code. I handled the first on a Thursday, the second on Friday.
Over the weekend I pondered.
On Monday morning I had several substantial operational improvements.
The most significant concerned the index registers, of which there were
seven! When moving from process to process one could not know if the
new process needed any or all of those registers. So the contents of
all registers had to be saved upon each change, being restored when the
relinquishing process was called again.
There in the code were Save Register instructions -- all seven of them.
But the operation code set I had studied showed a single instruction
that would save the content of all seven registers at once! So a
simple modification to use this instruction would save lots of time.
You can see the communication problem now. The hardware people had not
bothered to tell software about the new facility, and of course the
software people were so busy they had no time to go back and see if
there were any beneficial new capabilities.
MORAL-- Deliberately swap one or more people between the hardware
and software operations. A lot of eyes will be opened.
Back to History Index
Back to Home Page