Saturday, February 8, 2020

ZX Spectrum: 1942 loader detokening and .TAP format assembly

.TAP format is much easier to work with than .TZX, which seems to mainly be for duplication.

.TAP structure is simply a series of headers and file data to create a file listing. Header-data, header-data, header-data. A header is always 19 bytes long, but the length of the data block can be up to 64k.

Visualized, it looks like this:
|----------------------|
|   TAP block header   |
|----------------------|
|                      |
|    Header data       |
|       (19 bytes)     |
|                      |
|---<Checksum byte>----|
|----------------------|
|   TAP block header   |
|----------------------|
|                      |
~     Data bytes       ~
|                      |
         ...
|                      |
|---<Checksum byte>----|
|----------------------|
         ...
for each file on a tape.

Each header and data block has its own 3-byte mini-header as specified by the .TAP format. It's very simple:

; 3 byte block header:
DW BlockSize
DB BlockType
; then data
;  (...) followed by 
DB ChecksumByte

If the BlockType is 0x00 (indicating a header block), then BlockSize will always be 19 (in sequence 13h 00h). Headers are 17 bytes long, and the BlockType and ChecksumByte are added to the BlockSize length to get 19. 
If BlockType is 0xFF (255 or -1, indicating a data block), then BlockSize is the size of the data block (plus two bytes for BlockType and ChecksumByte).

Header blocks look like this:

DB FileType
DS "FILENAME  "
DW DataSize
DW Parameter1
DW Parameter2

FileType can be 0, 1, 2 or 3. BASIC data can be stored as types 1 or 2, but we are concerned with type 0 -- BASIC program -- and type 3 -- CODE (aka assembly).

The filename must be padded with 0x20 to 10 bytes.

DataSize is the size of the data to load. All I know is that this value is generally 2 less than the BlockSize in the following data header.

When FileType is BASIC program (0):
Parameter1 is the LINE parameter when SAVEing the program. I actually have not gotten this to work, and since 1942 keeps it at 0, I do as well.
Parameter2 is the location of the start of the working area of BASIC variables. A bootloader generally does not have variables, so in these cases, this value is actually the same as BlockSize (or, DataSize+2).

When FileType is CODE (3):
Parameter1 is the target memory address (e.g. the first parameter after CODE), and
Parameter2 is ALWAYS 8000h. Not explained why.

And finally, the checksum byte. This isn't a checksum per se, so much as it is a bit toggling of all the bytes in the block, minus the header (including FileType). Start with the FileType flag byte and xor it with each successive byte, then store the final result in ChecksumByte.

For the data block, this is calculated for me using a Python script post-assembly with the following code:

chk = 0xff  # start with flag byte
i = -1      # which is one byte behind
while i < len(inbytes)-1:
    chk = chk ^ inbytes[i+1]
    i += 1

And of course data blocks are simply raw data.

The trick was getting a BASIC stub to auto-run when you play the tape (harder than it seems when you're doing all the bytes by hand) and have that stub clear RAM and load/run the assembly program we want.

I couldn't figure out how to save a .TAP from the "speccy" emulator, so I had no choice but to open up a .TAP of 1942 and see what was up.

The first TAP header block in 1942 looks like this:
DW $0013          ; size in bytes
DB 0              ; type 0 = header
; then the header data:
DB 0              ; 0 = BASIC program
DS "1942      "   ; filename
DW 185            ; file size
DW 0              ; autostart line
DW 185            ; basic vars loc
DB $0f            ; checksum byte

Then, the data block. This is where I had to detokenize the program by hand, and figure some stuff out for myself.

The listing of the 1942 loader ended up looking like this:
10 BORDER 0:POKE 23624,0:POKE 23693,0:CLEAR 25592:POKE 23739,111 
20 LOAD "" SCREEN
30 LOAD "" CODE
40 POKE 23739,244:RANDOMIZE USR 25593
50 REM etc

1942 loads itself into $63f9 - contended memory, but a good starting point all the same.
As a point of interest, 23624 ($5c48) is BRDCLR, 23693 ($5c8d) is ATTR_P, and 23739 ($5cbb) is CURCHL. These correspond to a border color mirror, an attribute byte I need to investigate, and the currently selected IO channel.
This, along with the .TAP disassembly, was enough to get me started -- the basics are use CLEAR n-1, LOAD "" CODE, and RANDOMIZE USR n.

(Note that the best way to check the value of a token in any native BASIC version is to use PRINT CHR$(n). BASIC tokens don't overlap the standard ASCII byte space, so n is almost always > 127.)

First, explaining the Spectrum BASIC line format:
DW LineNo        ; Big-endian!
DW LineOffset    ; Bytes until next LineNo
( ... )          ; (listing)
DB $0d           ; endline

The important thing here is that the single 0x0d byte represents endline in Spectrum BASIC. ZX80/81 use a different endline (0x76, maybe?). Don't look for 00 00 as endline or 00 00 00 for EOF like on other systems - afaict there is no concept of EOF in ZX BASIC.

A large difference between Sinclair and other BASICs is that Sinclair wastes a ton of space on storing numbers as strings, but condenses all spaces automatically. Here is the hex listing for my very short loader program:

13 00 00 00 4C 4F 41 44 45 52 20 20 20 20 2C 00
00 00 2C 00 11 2E 00 FF 00 0A 0D 00 FD 32 35 35
39 32 0E 00 00 F8 63 00 0D 00 14 05 00 EF 22 22
AF 0D 00 1E 0E 00 F9 C0 32 35 35 39 33 0E 00 00
F9 63 00 0D 70

And the corresponding BASIC:

10 CLEAR 32767
20 LOAD "" CODE
30 RANDOMIZE USR 32768

CLEAR 32767 This sets BASIC's HIMEM to 7fffh. Doing this tells the ZX that the next time we load bytes from tape, they should go to the byte after this address (8000h).
LOAD "" CODE This is equivalent to "Load the next file available from tape as an assembly program (to the lowest point in memory I've allotted)". This will load the next chunk of data pointed to by a .TAP header in the .TAP file as a binary to 8000h.
RANDOMIZE USR 32768 This is, for some reason, the common way to start machine language routines on the speccy. This is equivalent to "JP $8000".

The tricky part here is that immediately after string numerical constants (which are stored as ASCII), they are followed with byte 0x0e (integral modifier byte) and then stored as:
DB 0
DB PolarityByte
DW IntValue
DB 0
Such that 25592 becomes 11 bytes(!!):
32 35 35 39 32 0e 00 00 f8 63 00

Also, the final line, RANDOMIZE USR n, is the sequence of bytes f9 c0. Whenever you see this in Spectrum BASIC its a CALL/JP command.

As mentioned above, I wrote a Python script to calculate the checksum bytes for me. Running it on the BASIC stub binary and a compiled asm binary I resulted in two .TAP files: one for the BASIC loader, and one for the hello world program.

.TAP is brilliant because you can $cat a.tap b.tap > ./c.tap and suddenly have a complete tape file. I tested b.tap with this binary code, assembling as-is with nothing else:

%org $8000

Init
    xor a 
    ld [WorkRAM_a], a
Loop:
    ld a, [WorkRAM_a]
    inc a 
    cp 8
    jr nz, .ok
     xor a
    .ok:
    ld [WorkRAM_a], a
    out [ZX_IOPORT], a
    jp Loop


WorkRAM_a: rb 1

And it worked! With a little bit more work and bash nonsense, I have a one-click script that will assemble a ZX Spectrum .TAP image (with loader!) to an address I specify from a single assembly listing.


Comment if you are interested in learning more.

No comments:

Post a Comment