Disassembling the Triad T2556: Memory

Part Nine in a Series on the Triad T2556

Photo by Jo Szczepanska on Unsplash

Last time, I covered how the machine configured the video controller. Now I’ll talk about the steps that the machine goes through on boot to verify integrity of the program EPROM and RAM.

It’s essential to have high confidence in the EPROM (read-only program memory) since a flipped bit would cause unpredictable behavior during program execution.

The RAM (read-write memory) comes in two flavors: display memory, and memory that is used in program execution. We have to trust the display memory, since it is the machine’s main way of communicating with the world, and variable storage has to work correctly, since strange things will happen if it isn’t.

I’ll start off digging into the details of how these blocks of code work, but — as before — will avoid repeating explanations as those same patterns reappear.

BLOCK 2A: This performs some initialization for us.

  • The register pairhl will point to the beginning of EPROM space (at address 00000h). This is used as a pointer to a byte that we will operate on.

BLOCK 2B: The math here is pretty simple. For the byte referenced by the hl register pair, we modify b and c as follows:

b = xor(b, byte)
c = rrc(xor(c, byte))

In case you’re not familiar with the exclusive-or xor and rotate-right-with-carry (aka rotate-through-carry) rrc operations, you can learn more about them on Wikipedia.

BLOCK 2C: Next, hl (the pointer to the memory location) is incremented to point to the next location to examine in memory, and de (the count of the remaining bytes to process) is decremented. This block is a common design pattern that you’ll see in Z80 assembler:

ld a, 0
dec de ; decrement de
cp e ; is e equal to 0 (the contents of a)?
jr nz, loop ; re-run the loop if it is not
cp d ; is d equal to 0 (the contents of a)?
jr nz, loop ; re-run the loop if it is not

It’s a bit convoluted, but the idea is that the loop will continue until the register pair de reaches zero.

BLOCK 2D: When block 2C drops through to block 2D, registers b and c contain a cumulative calculation from all of the EPROM memory locations, excluding those final two bytes.

If the data collected has been changed from the expected, it is likely that registers b and c will be different from the two pre-calculated values that are stored at ROM_CRC_start.

In block 2D, we put a copy of the b register into a, and then compare this value with what’s stored in ROM_CRC_start. Assuming that these values match, a similar comparison is made between the c register and the byte following ROM_CRC_start. Assuming this comparison also succeeds, processing moves on to rom_check_pass.

BLOCK 2E: If the test does fail, then the machine will lock and a continuous tone will be generated (the mechanics of this is described in the previous article).

BLOCK 3A: The previous tests passed, so we generate a tone to indicate success

BLOCK 3B: It looks like we’re changing system configuration to access video RAM.

BLOCK 3C: at the start of this block, hl is set to point to the bottom of video memory, and de is set to be a counter of how much memory is left to process.

At the end of this section, there’s a decrement and conditional jump like block 2C.

In between these two, the code is:

ld (hl), l
inc hl

So each location in memory through the range is loaded with the present value of l.

But is l the low byte or the high byte? It’s actually really easy: h is high, l is low.

So the first byte written in the loop will be 0, the next 1, and so on up until 255 when those eight bits will roll over to zero again.

BLOCK 3D: This is similar to block 3C, except that we read back the value in each memory location and verify that the value stored is as expected by comparing it with the value of l. l of course, is incremented for each memory location just as it was before.

In the event that there is an error, the code will skip out to an error handler.

Once a byte is verified for correctness, the instruction cpl is executed. It sort-of sounds like it’s comparing with the l register, but that’s not it at all. Actually cpl performs a one’s complement of the accumulator by flipping all of the bits: ‘1’s become ‘0’s, and ‘0’s become ‘1’s.

This inverted value is then written back to the memory location before moving on to repeat the process for the next location.

BLOCK 3E: This is the same drill as for Block 3d, except each location in memory is validated to ensure that it contains the one’s complement of the counter.

Why is each location checked for read and write of a given value and it’s inverted value?

Memory is tricky. It’s a huge array of blocks of transistors (each block representing one bit) with tiny wires joining the blocks in a grid. If one of those blocks fails, it might fail so that it is always on or always off. So testing everything as on and off helps improve confidence that things are working as expected*³.

BLOCK 4A: Now that there is confidence that the video RAM is functioning correctly, we can start to use it!

The instruction ld sp, 0ffffh loads the stack pointer (a 16 bit register that references the top of the stack in memory) with the top of video RAM. It’s not a great stack location, but it’ll do at a pinch until there’s something better.

BLOCK 4B: subroutines can be called now that there is a stack. This particular subroutine fills a block of memory with a value, pretty much the same as memset(). In this case,hl is used as a pointer to a start address, bc a count of bytes, and a as the value to fill. This particular block fills the bottom video memory block of length 07d0h bytes with 020h (a space character).

07d0 hexadecimal is 2000 in decimal. This equates to a display format of 25 rows of 80 characters.

So this block appears to clear the display.

BLOCK 4C: this is a similar operation to block 4B, except it’s operating on the higher 2K block of memory in the video address space and loading the range with 00h. At this point, it’s still not clear what this block of memory does.

BLOCK 4D: contains yet another of those tone generation blocks.

BLOCK 5A: the new and interesting thing here is the ldir instruction. ldir is a Z80 block instruction*⁴ to load, increment, repeat, and will loop for a given number of cycles.

Here, hl points to the source address, de points to the destination address, and bc contains the number of bytes to copy.

This one copies the block of ROM containing the string “Terminal checkout running…” to the display memory. It’s a super-primitive print statement.

I’m still happy with the scaffold I wrote to create labels that address strings, just look at how it helps with the intent of this code!

BLOCK 5B: Here’s another block instruction. This one is a bit more complicated though. First, the memory location VIDEO_RAM_HI (0f800h) is loaded with 080h, and the ldir will propagate that value across the range. It’s a memset() operation, but probably written by someone else on the development team so it looks different to the code in blocks 4b and 4c.

I’ll go into what the VIDEO_RAM_HI memory block does another day.

BLOCK 5C/5D/6A: It took some research to figure out that addresses in the range 0b000h-0b57ch are referenced quite a bit in load and store operations. This space is in RAM, and I figured out that this memory block is used for variable storage.

I wrote some code that analyzes where variables are referenced, and whether it’s a single byte, a pair of bytes, or a memory block. You can see the generated labels over on the left.

Before variables can be stored, however, it’s essential to verify the integrity of that space. For each of these code blocks, a similar pattern is used to verifying video RAM integrity: write a sequential range of values through the space, verify that each byte reads correctly, inverting each byte as we go, then verifying that the inverted byte was written correctly.

BLOCK 5E/6B: Just another tone loop

BLOCK 6C: Here’s a duplicate “Terminal checkout running…” text block, it’s not clear to me why this has been repeated.

BLOCK 6D: This sets the high video memory to 084h. Again, I’ll get to what that means later.

BLOCK 6E: Prints “Terminal is ill… Please call Field Service” to the display.

BLOCK 6F: Another tone loop, then repeat forever.

So… that’s it! By this point in program execution, we now have confidence in the display to the point where messages can be written, we know that the program EPROM isn’t corrupted, and we trust the way that variables will be stored in RAM.

Well. Either that or something went horribly wrong and a tone is being generated.

Looking at the label that comes next, we’ll be digging into the serial input-output controller (“SIO”) next.

Hopefully you managed to follow along with this. If you have any questions or ideas on how I can explain this better, then please let me know!

I labeled these with “CRC” since it looked like a cyclic redundancy check on first glance, but there’s no polynomial lookup. In fact, it’s not much more than a simple parity checker. Same difference, kinda: the point is that we are looking for errors in a block of information by comparing a computed value to some known quantity.

Z80 processors are what’s called little-endian,meaning that if the content of register pair hl (for example) is written to memory, then l is written first, and then the next memory location will be loaded with the value stored in h.

The terms little-endian and big-endian were first used by Danny Cohen in Internet Engineering Note 137 from 1980, and is a nod to the 1726 novel Gulliver’s_Travels by Jonathan Swift. It’s uncertain whether religious wars were fought over less.

*³ Over the years, we learned that there are a whole bunch more failure modes that memory devices can exhibit. For example when one bit follows the state of an adjacent bit. Modern RAM has much higher levels of sophistication in design and features that enable the exponential growth in storage densities. But that’s a separate conversation…

*⁴ Block instructions are crazy! The idea of one op code that can traverse a bunch of memory to perform searches, copy data, or perform I/O was a huge surprise to me. The actual execution is carried out in the processor’s microcode. You’d expect that executing everything in microcode would eliminate the processor cycles required to fetch an instruction from memory for each loop cycle, but it doesn’t: every pass of the loop will reload the block instruction from memory. The only real advantage of block instructions is that they potentially occupy less program space, something that was really important when address ranges were measured in kilobytes.