Processing in Grue

Published on September 22, 2024

At this point, I’m very close to one of the meatier subjects in terms of creating a Z-Machine implementation, which is the decoding of instructions from the zcode binary. Before I get into that, I’m going to take a look at the overall processing that will need to be supported.

Emulating the Z-Machine basically means emulating a CPU, often just referred to as a processor. The processor at the heart of any computer — even a virtual one — can ultimately understand only the most simplistic of instructions. These instructions, known as “operation codes” (or just “opcodes”) do really simple things. Some examples of “really simple things” are:

  • Moving a single number from memory into a register of the processor.
  • Adding a number already stored in a register to another number.
  • Putting the result from an operation back into memory.

The Idea of Opcodes

Each opcode is identified by a unique sequence of “on/off switches,” better known as bits. Some of the earliest programmers were quite literally bit flippers. This meant they entered long sequences of 1s and 0s by hand, essentially “flipping” a given bit to one of the two values.

Technically you could argue that’s all programmers do these days as well but the levels of abstraction are so high that what we do now is light years in distance from what we did back when computing was just getting started.

Tools called assemblers were eventually developed and these allowed those programmers to replace 1s and 0s with unique textual identifiers, often referred to as mnemonics because they were meant to indicate what the operation did. For example, STO would store a number in memory, ADD would perform an addition operation, and so on.

The Use of Opcodes

To use a retro example, firing up my Commodore 64, I can enter this:


POKE 53280,0

That’s a statement written in BASIC. The memory address 53280 (0xD020) is used for the background color of characters and graphics of the machine. The above command would set the background to black. What does POKE mean, though? It was just the mnemonic for an instruction that was used to change the contents of any address in memory. A PEEK instruction, by contrast, would provide you with the contents of memory at a given address.

Incidentally, there is no “POKE” equivalent in 6502 assembly language. I mention that because the Commodore 64 was based on the 6502 microprocessor and its instruction set architecture. In that context, it’s not possible to load a memory location with a value within a single instruction. So, to change the color to black, the BASIC statement “POKE 53280,0” actually becomes this:


LDA #$00
STA $d020

The mnemonic LDA (Load Accumulator) is used to load a value into the accumulator, and the mnemonic STA (Store Accumulator) is used to store the value from the accumulator into a memory location. In this case, “LDA #$00” loads the immediate value of 0 into the accumulator, and “STA $d020” stores the value from the accumulator into the memory location represented by $d020, which corresponds to the chip’s register controlling the background color.

Without the mnemonics, the assembly code snippet would be represented like this:


A9 00
8D 20 D0

How that breaks down is:

  • A9 is the opcode for the “Load Accumulator Immediate” instruction.
  • $00 is the immediate value being loaded into the accumulator (register A). In this case, it’s 00 which represents the value zero.
  • 8D is the opcode for the “Store Accumulator Absolute” instruction.
  • $20 D0 is the memory address where the accumulator value will be stored. In this case, it refers to the memory address $D020, $20 is the low byte, and $D0 is the high byte. Together, they form a 16-bit memory address.

In binary, this would have been:


Load Accumulator Immediate:
A9        00000000

Store Accumulator Absolute:
8D 20D0   00100000 11010000

While it’s easy to see that these mnemonics were useful aids for programmers, they meant nothing to any computer. It was only that lowest level binary that actually meant anything to a processor. So after writing the program using a system of mnemonics, the programmer had to pass it through the assembler to generate the 1s and 0s the computer needed.

Eventually higher-level languages were created that would let these programmers create their programs at a much greater level of abstraction from the hardware. This allowed them to focus more on the logic of what they were trying to achieve and less on bit twiddling.

Programming Languages

The first really complete example of such a language arrived in 1954 and was called FORTRAN. This language is historically interesting (relevant to this project anyway) because it was the language a guy named Will Crowther chose to code a game called Colossal Cave Adventure in. This was back in 1975. Zork would initially pay homage to this game and was arguably written as a successor to it.

Another such language was LISP, which came on the scene around 1958. This one is historically interesting (again, as relevant to this project) in that it was the ancestor of a language created at MIT called MDL. That, in turn, was the ancestor to a language created by Infocom, called ZIL (Zork Implementation Language). ZIL is ultimately what the Zork that most people played was written in.

A program written in ZIL was compiled into a form of bytecode, which is the zcode I’ve been talking about since this series began. This zcode was executed by having it run through another program called a ZIP (Z-Machine Interpreter Program).

At this point in the series, you certainly know that the ZIP is what Grue has to emulate.

How Much to Emulate

All of what I describe above is important because it means that when I read the Z-Machine specification, I have to understand how the Z-Machine interpreters — the ZIPs — were created and worked back then but also translate that to what is needed, or practical, today.

For example, as the most obvious thing to consider, today we’re generally not dealing with games on disks at all, much less multiple disks. Thus paging systems and segment tables are not something I’ll likely be worrying about.

As another example, perhaps less obvious, there really is no need to make a distinction between dynamic (mutable) and static (immutable) memory in a modern implementation. And while that’s technically true, it’s still probably a good idea to do so. One of the main reasons being that doing so helps ensure that static memory is never written to. Also if the interpreter restarts a story file (or saves and restores the state of one), it’s necessary to be able to provide the original state of the dynamic memory. That feels like it’s going to be a lot easier if Grue has clear separation of what “dynamic memory” is.

Yet another good example of this has to do with the concept of the stacks that were briefly mentioned when I talked about memory and state. There’s technically two stacks: a call stack and an evaluation stack. But are these separate structures or just one structure? That’s really up to the author of the interpreter. A “stack” is nothing more than a data structure that’s a wrapper around some numbers.

So what matters is that the zcode program being executed can only directly access the data structure (“stack”) of the current routine. But the zcode program itself has no way to know whether the interpreter it’s running under has one large stack that just sits in memory or separate call stacks (one for each routine called) that in turn each have their own evaluation stack.

The point here is: I’ve spent a lot of time learning about how things worked; now I have to figure out how I want to make things work. Thus, in the next post I’ll start talking further about how Grue has to interact with the zcode program it has loaded up.

Share