EA: Where did the idea for the Java virtual machine come from?
JG: Back when I was a grad student at Carnegie Mellon, I had this problem where I needed to have some kind of an architecture-neutral distribution format. We had a bunch of workstations called PERQ machines. The folks who built them were a bunch of hardware guys who didn't want to do software. The only compiler that they could get for free was UCSD (University of California San Diego) Pascal. So they made the hardware interpret UCSD Pascal p-codes.
My thesis advisor, Raj Reddy, asked me to spend the summer trying to figure out how to get the software from these PERQ machines to run on our VAXs.
I started out writing a little hardware emulator, just to understand the p-codes. Then I realized I could actually write a code-generating program that translated from Pascal p-codes to VAX assembly code.
Defy all challenges with Microsoft® Visual Studio 2005.
So I wrote a hardware emulator for the PERQ machine that did hardware emulation by translating, and I spent a bunch of time trying to figure out why it was that the translation actually worked. One of the things that I noticed was that the code that I was getting at was actually better than the code that was coming out of the C compiler.
EA: If somebody were implementing an interpretive environment, what would you tell them to look at in terms of the security issues? Any specific tricks you've learned over the years?
JG: It's a layered phenomenon. At the lowest level, you have to know that the boundaries around the piece of software are completely known and contained. So, for example, in Java, you can't go outside the bounds of an array. Ever. Period. Turning off array subscripting is not an option.
A lot of work has gone into optimizing compilers. Then you know there's no way that you can get unbounded memory corruption, and that when you hand a piece of memory off to some suspect piece of code, it can't then use it to get outside of that. So if you've got an interpretive language that supports things like C's unbounded pointers, it's very difficult—or at least expensive—to really bound things that it can touch.
EA: Unlike C.
JG: Unlike C, where you can basically lie about anything. In fact, a lot of standard practice in C is all about lying about the identity of things. But once you've got an environment where you can't lie—you can't wiggle around things—then you can start building mechanisms that find what things can do, and you can then observe things and have some faith that what you're observing is true. You're not seeing somebody who is tricking you by going around the edges of an interface. Then a lot of the stuff layers up from there, and just falls into place, but the higher-level stuff is pointless unless you've got the lower-level integrity.
EA You don't want to build on quicksand?
JG Exactly. It's like watching somebody build a building downtown—anywhere—they always dig down to bedrock and then go up. If you don't do that first, it's just going to fall over.
The only problem (and irony) of course.. is that Java is written in C. They weren't smart enough to write Java in Java. UCSD Pascal was written in Pascal. Freepascal is written in freepascal. Qomp is being written with Qomp compiler.