Thursday, September 12, 2024

PROPERLY reading keyboard input in DOS environments - IRQ Hook

 There are by my estimation THREE different ways of reading keyboard input in DOS, using C. 

The first is getch().

// getch
#include <iostream.h>
#include <stdio.h>
#include <conio.h>
/* this does not give keyup event */
void main(int argn, char** argc){
int ch;
while(ch != 1){
ch = getch();
switch (ch)
{
case 0:
case 0xE0: /* extended key */
switch (ch = getch())
{
case 0x48: cout << "up arrow\n"; break;
case 0x4B: cout << "left arrow\n"; break;
case 0x4D: cout << "right arrow\n"; break;
case 0x50: cout << "down arrow\n"; break;
default: cout << "extended key " << ch << "\n";
}
break;
case 9: cout << "backspace\n"; break;
case 0xd: cout << "return\n"; break;
case ' ': cout << "space\n"; break;
default: cout << "normal key " << (char)ch << "\n";
}
}
}


From what I can tell
, most compilers interpret this function as the DOS BIOS' 0h INT16 call (which is get key from buffer). The problem with this is the usual - you must deal with the operating system's manipulation of the keyboard buffer to give you keys read; which is simply one interpretation of the keyboard's state. You can modify the terminal / TTY settings of stdin/stdout, but at the end it is not appropriate for games. getch() does not return any keyup events, as these are handled on the OS layer, so you will never get a value from getch() with bit 7 set (presumably).

The second is reading the keyboard's I/O ports directly:

// Be careful: without an interrupt, key events can be missed
unsigned char read_key();
// 8255 PPI
// After getting state, you must flip-flop the PB
// in 0x61, && 0b1xxxxxxx, out 0x61
// reset 0b0xxxxxxx, out 0x61
#pragma aux read_key = \
"in al, 0x60"\
"mov ah, al" \
"in al, 0x61"\
"or al, 0x80"\
"out 0x61, al"\
"xor al, 0x80"\
"out 0x61, al"\
value [ah] \
modify [ al ];

The 8255 PPI is legacy-emulated on pretty much every keyboard nowadays, by the way! USB converts its own signal to PS/2 which is by definition compatible with IBM/PC.

Anyway, this assembly works great! And it's super fast!

Unfortunately, as you can see by the first line comment, it is possible (and happens frequently) that if you are not constantly monitoring the keyboard, you will miss a keyup event. This is fine for probably 80% of purposes, but for action games in particular, a stuck key means death.

I tried a large variety of things to get around this limitation, but unfortunately, the only thing that works is clearing out every other "pressed" key state you have in memory when a new key is pressed. The code may be simple, but no two keys can be pressed at a time! Argh!!

 The third method is using the DOS INT16 0h and 1h calls to check for and retrieve keys from the system's keyboard buffer. This isn't very difficult to implement, so I won't waste time on it here, but the effect is nearly identical to default getch() and stdin. You don't get super fast access to the PPI and you can still miss a keyup event.

After a bunch of hemming and hawing, it was time to do more research.

So, operating systems don't "scan" a keyboard the way older micro-computers did. On virtually every 8-bit system I've programmed for, you can get the state of any key on the keyboard at any time - with exceptions, depending on how things are wired.

What is done nowadays is the keyboard is hooked into a hardware interrupt. And unfortunately, I seemed to be in a situation where that was the only option of getting rid of keyboard input bugs. 

So I did my research, but wouldn't you know it, that kind of thing doesn't have a lot of information. There are some StackOverflow posts here and there, but nothing particularly clear and definitive. There wasn't even a clear list of where the interrupt vector address was!

After some frustration and tweaking of my search keywords, I discovered this wonderful post, which had some information I couldn't find elsewhere:

    1. The DOS keyboard interrupt vector is 09h
    2. Watcom C/++ has a getvect() and setvect() macro, which took me to being able to find out that the vector table for MS-DOS is 255 entries of two words, one containing the segment address and one the instruction pointer (code offset). This means 0024h holds the vector address!

Well, hold on. Trying to write the address of my replacer function to 0024h directly isn't working, even with interrupts off. So, I did what every sane person does: Disassembled a Watcom produced .exe that uses the setvect function to see what it does!

                                                           FUN_1000_0008:1000:0082(c)  
       1000:0174 52              PUSH       DX
       1000:0175 89 da           MOV        DX,BX
       1000:0177 1e              PUSH       DS
       1000:0178 8e d9           MOV        DS,CX
       1000:017a b4 25           MOV        AH,0x25
       1000:017c cd 21           INT        0x21
       1000:017e 1f              POP        DS
       1000:017f 5a              POP        DX
       1000:0180 c3              RET

Oh - they don't do anything at all. They use the DOS service routine 25h to do it for them! No wonder. 

So, copying this method, we can PROPERLY replace their IRQ service routine with our own, based on the post above, which will read every key's scancode into a 0 or 1 buffer.

Marking it as __interrupt will (we hope!) cause it to save all registers properly. We only need it to flip a bit in the char keys array, so I simplified it down a bit and removed almost all of the compiler-dependant code and wrote ASM macros for Watcom. 

In the end, the final, interrupt-based, flawless method looks like this: 

#include <iostream.h> // cout
#include <graph.h> // settextposition and clearscreen

typedef unsigned char u8;
typedef unsigned char bool;
typedef unsigned short u16;

// Entire keyboard scan is held here, including extended keys:
u8 keys[192];

void far * old_kb;
static void interrupt kb_int();

void write_port(u8 p, u8 v);
#pragma aux write_port = \
"mov dh, 0"\
"out dx, al"\
parm [dl] [al]\
modify [dx];

u8 read_port(u8 p);
#pragma aux read_port = \
"mov dh, 0"\
"in al, dx"\
parm [dl]\
value [al];

void set_irq_vector(u8 iq, u16 segment, u16 offset);
#pragma aux set_irq_vector = \
"push ds" \
"mov ah, 0x25"\
"mov ds, bx"\
"int 0x21"\
"pop ds"\
parm [al] [bx] [dx]\
modify [ah];

void far * get_irq_vector(u8 iq);
#pragma aux get_irq_vector = \
"mov ah, 0x35"\
"int 0x21"\
parm [al] \
modify [ah] \
value [es bx];

#define segment(a) ((u16)((unsigned long)(void __far*)(a) >> 16))
#define offset(a) (u16)(a) // dont need to & 0xffff because casting down does this

int main() {

old_kb = get_irq_vector(9); // save old vector
set_irq_vector(9, segment(kb_int), offset(kb_int));
// Key display loop taken from sample code
_clearscreen(0);
while(!keys[1]) { // normal key 1 == ESC
for(int y = 1; y < 5; y++){
int i;
_settextposition(y, 0); // y, x
for (i = 0; i < 0x30; i++) {
cout << (int)keys[i + ((y-1)*0x30)];
}
}
}
//
set_irq_vector(9, segment(old_kb), offset(old_kb));
return 0;
}

static void interrupt kb_int() {
static u8 buffer;
u8 inc = read_port(0x60); /* get byte from port 60h */
bool on_off = !(inc & 0x80); /* bit 7: 0 = on, 1 = off */
u8 scancode = inc & 0x7F; // bits 0-6 are the key

// First, we check our last buffer to see if it has an E0 byte...
if (buffer == 0xE0) { // was our last byte E0?
if (scancode < 0x60) // if so, extended key
keys[scancode + 0x60] = on_off;
buffer = 0;
} else if (buffer >= 0xE1 && buffer <= 0xE2) {
buffer = 0; /* ingore these cases... */
} else if (inc >= 0xE0 && inc <= 0xE2) {
buffer = inc; // store it in static var for next loop!
} else if (scancode < 0x60) {
keys[scancode] = on_off;
}

write_port(0x20, 0x20); // ack IRQ, needed
}

Three full days of work! Phew! (And I didn't even do the hard part myself!)

Astute viewers may have noticed the static buffer char inside the static kb_init() function. This is explicitly to save the previous interrupt's E0, E1 or E2 character in the case of an EXTENDED character code. All we need to do is check this buffer char *first* to see if we are an extended key scancode or not. If the buffer HAS one of those Ex keys, we clear for the next key. 

This looks longer than it should be because of the assembly macros, but it is quite simple if you take the time to look at it. 

Done! This will get the immediate state of every key on the keyboard, through a properly handled interrupt - perfect for game development!

If this helped you, or you have anything in reply, please leave a comment!




Friday, June 7, 2024

Coding in mixed Assembly and C for the PC-9801 in 2024

(Scroll down to the bottom to download the project)
Project contents:
     _LCC             : LSI-C makefile
    98header.s       : FDI header source
    checkexe.py      : Examine DOS-EXE and optionally relocate
    insert_to_fdi.py : Inserts a file to resolved cylinder/head/sector address of FDI                        file
    main.s           : ASM source for IPL/initial program loader
    Makefile         : GNU Make for this project
    test.c           : LSI-C source for TEST.EXE hello world


There isn’t a whole lot of options out there when it comes to compiling code for 16-bit real mode apps - what you need to make DOS mode games on the PC98!
It started with curiosity of how I would get started hacking Record of Lodoss War.
It started from this assembly hello world example I found online here: https://qiita.com/TakedaHiromasa/items/371503c48ac33237a859 , and became a rabbit hole on every single edge.

1. Researching Disk Formats and Boot Loading
2. Finding a C Compiler that works is hard
3. Research into Disk Coding
4. DOS-EXE Disassembly 

5. …And Trimming Everything Into Place

1. Researching Disk Formats and Boot Loading


Based on the assembly example, there was a functional disk header defined in data bytes included. This isn’t very helpful however, since the format it uses (TFD) is not used by any emulator I have. I COULD, though, compare it to the FDI header generated from a blank FDI disk thanks to barbecues tools https://github.com/barbeque/pc98-disk-tools/ . After some stumbling around NASM syntax, I realized that generating a FDI header was pretty easy:

                   DD 0
        DD 144
        DD 4096
        DD 1261568
        DD 1024
        DD 8
        DD 2
        DD 77

        times 0x1000 - ($-$$) db 0


This file I compiled and saved as header.bin and put aside.

Despite a few small syntax errors in the example, I mangled to compile and run it as an FDI.
At the end of the assembly file, simply adding

        times 0x134000-($-$$) db 0xbe


will fill out the rest of the bytes. Then, running

        cat header.bin prog.bin > out.fdi

will create a bootable fdi image.

With a fair bit of ease, I was able to determine vis-a-vis DOSBox-X debugger (command: “ev cs”) that the code segment after booting from a disk with anything on it (e.g. jp $) was at 1fc00h. With the “memdump 1fc0:0000 1000” command, I was able to determine that 1024 bytes were copied from the disk into segment 1fc0 and ran there. The rest of the bytes after 1024 were not 0xbe, telling me no more were loaded from disk.

This is as expected. The PC8801 copies the first sector from disk into memory and runs it as well - but on the 88, it’s only 256 bytes.

Great! On to the next thing - can I do C? And if so, can I interoperate with arbitrary asm?


2. Finding a C Compiler that works is hard


I tried to install gcc-ia16 (various platform failures), Borland for DOS (crashes on PC98), build OpenWatcom (failed for lack of lib support) AND run its DOS installer (hangs the PC98), all to no avail.

Feeling frustrated, I googled Cコンパイラ PC98 and found this Qiita page: https://qiita.com/mikuta0407/items/e659b0d5101464aba071 where they linked to an archived version of the LSI-C86 compiler toolchain!

This is the business. I quickly downloaded it, tested it, and uploaded it to archive.org. It works well, EXCEPT, after much and much fumbling I was unable to get the linker (ldd) to play nice with its own assembler created files (r86) exporting to COM. This was to simplify the interaction between ASM and C. Unfortunately, perhaps due to a bug, COM export failed.

I was getting EXEs though, and that’s great news! So maybe there is a way to run the executable file from assembly? Perhaps stripping the header and calling the code start point? How complicated could exe files be anyway? Famous last words.

I made a quick Hello World that writes the text VRAM directly and left the EXE for now.

Before diving into exes, I wanted to make sure that I would be able to load a file that existed in a random location on disk into general RAM. For that, I would need some documentation on the floppies…


3. Research into Disk Coding


Lucky for me, in my ramblings in search of info (memory maps, etc) I stumbled upon mention of the PC-98 Programmer’s Bible: https://archive.org/details/PC9801Bible/. I decided to check it out, and wow! Glad I did.
This contains information on every BIOS call, including disk I/o, so despite some trouble in converting the example C program back to ASM and way more time than I should have spent narrowing down  the formula for the location on disk, I got a disk writing and reading routine done in just a couple hours.

%macro DiskLoad 6
; Src C, Src H, Src S, Dst Seg, Dst Ofs, Byte Ct
    mov cx,3<<8 | %1 ; sector len, cylinder
    mov dx,%2<<8 | %3 ; head, start sector
    mov ax,%4 ; segment
    mov es,ax
    mov bp,%5 ; offs
    mov bx,%6 ; bytes
    mov ax,0x76<<8 | 0x90 ; load cmd
    int 1bh
%endmacro
%macro DiskWrite 6
; Src Seg, Src Ofs, Dst C, Dst H, Dst S, Byte Ct
    mov ax,%4 ; segment
    mov es,ax
    mov bp,%5 ; offs
    mov cx,3<<8 | %1 ; sector len, cylinder
    mov dx,%2<<8 | %3 ; head, start sector
    mov bx,%6 ; bytes
    mov ax,0x75<<8 | 0x90 ; write cmd
    int 1bh
%endmacro


*Of note, even if the byte count in bx is < len(sector), the remainder will be filled with 00 if writing to disk - precision reads may be possible but precision writes are not.

While I was doing it, I wrote a python script (and sent it over to barbeque) that will add a file to a given cylinder, head and sector of a FDI image.

The formula, given a header of 0x1000 size and variables cy, he, se, is:

    loc = (int(cy) * 0x4000) + (int(he) * 0x2000) + ((int(se)-1) * 0x400) + 0x1000

The se (sector) variable is subbed by 1 because it's range is 1 to 8 instead of 0 to 7.


Almost there: I needed to figure out how to load up these pesky DOS-EXE files.


4. DOS-EXE Disassembly 



It took me about 20 tries reading over http://justsolve.archiveteam.org/wiki/MS-DOS_EXE , http://www.textfiles.com/programming/FORMATS/exefs.pro , https://groups.google.com/g/microsoft.public.masm/c/T6mvLia40SE/m/50_TLyRgmyoJ and https://moddingwiki.shikadi.net/wiki/EXE_Format (all which contain slightly varying and slightly confusing explanations of the same thing) before I understood what *precisely* the EXE file was doing.

It may help to restate:
The EXE file can be loaded to an arbitrary memory address. Whatever that address segment is, must be added to all of the locations that are pointed to by the values in its relocation table. Every (code_start + table_val) location contains a relative offset / segment value that must be added into the load address.

SO. A relocation table of: 

01 00 00 00, 

34 00 00 00, 

17 00 00 00
Means that at (code_start + 01h), (code_start + 34h), and (code_start + 17h) there are values which must be incremented by LOAD_ADDRESS. Then, you can safely JP LOAD_ADDRESS.

That’s it! This also means if you know where you are going to execute the code from, given a static memory map, you can strip the header completely and adjust the relocation segment variables beforehand.


5. …And Trimming Everything Into Place


Which is exactly what I did. Another hour or so and I had a “quick” python script to do the following:
- scan the header,
- print the data on the table,
- trim the header and
- adjust the relocation values in the file.

After that, everything was ready… I just had to perform the magic ASM:

          DiskLoad 0,0,2,0x2000,0,1024*6
    call 2000h:0000h


and use my trusty Makefile to put it all together:

default:
      nasm 98header.s -o header.bin
      nasm main.s -o prog.bin
      cat header.bin prog.bin > app.fdi
      python3 checkexe.py TEST.EXE -r 0x2000
      python3 insert_to_fdi.py app.fdi TEST.EXE_ 0 0 2


… AND IT WORKS!

Something is actually as easy at it looks for once! What a surprise.

You can download the entire project here!


But you must have all the tools (LSI-C, NASM, GNU Make, Python3) yourself.

Saturday, April 8, 2023

Rolling D20s on the Z80; and other random thoughts about randomness

 

Rolling D20s on the Z80

and other random thoughts

about randomness


My current project uses simulated dice rolls of varying face counts for combat results. I originally was not thinking of targeting 8-bit systems, but obviously I changed my mind. I went through a couple phases finding the best implementation, and I thought it interesting enough to write down!

To start: Generating a random 8-bit number through a linear feedback shift register on Z80 has been well documented.

I use the following method, which is found all over the internet:

;James Montelongo
;optimized by Spencer Putt
;out:
; a = 8 bit random number
; clobbers ALL
ld hl,_LFSRSeed+4
ld e,(hl)
inc hl
ld d,(hl)
inc hl
ld c,(hl)
inc hl
ld a,(hl)
ld b,a
rl e \ rl d
rl c \ rla
rl e \ rl d
rl c \ rla
rl e \ rl d
rl c \ rla
ld h,a
rl e \ rl d
rl c \ rla
xor b
rl e \ rl d
xor h
xor c
xor d
ld hl,_LFSRSeed+6
ld de,_LFSRSeed+7
ld bc,7
lddr
ld (de),a
ret

0-255? Cake!

But what if we are trying to generate a percentage of 1-100? But what about a d8? A d20? What about a complicated dice roll for our 8-bit OGL 2.0 games?? Well, that's where things get complicated.

If you are only given an input of 0-255, and you have to make that number fit WITH A SIMILAR PROBABILITY into, say, 20 digits (for a 1d20), then suddenly you have some complications.

If the number generated by the LFSR is in between 1-20, its no problem. But that is only 20 out of 256 cases. If the number is, say, 149, what's the fairest way to squash that down into the range we ask for? What if its 0?

A mathematician will tell you to find the lowest common denominator via cross product between the die sides (20) and the result range, 256. e.g.

n/20 x 149/256
== 2980n/256
== (rounded down to) n=11.

149/256 is approximately 11/20, great. Wouldn't this suck to do in assembly though?Before I wanted to try mul8 and div16, which feels like a lot of wasted cycles, I was thinking about various shortcuts that I could use.

My first attempt was to simply srl the value until it fits within the constraints. When discussing this with my sibling, they correctly pointed out that, in effect, this would keep the value in the top half of the range; especially with higher 'die face' values this would skew results far too high. Its super quick, but extremely dirty. Not good for a game.

My second attempt was a similar method of bitmasking:

MAXU8:
ld c,rangeMax 
ld b,0xff
_m:
cp c
ret c ; exit 
srl b 
and b 
jr _m
This masks the input with B each iteration, this time B is shifted right each loop. This provides, surprisingly, pretty good results. With an input of 0...n, and a max of 20, this is what I was getting:

00 01 02 03 04 05 06 07
08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 05 06 07
07 08 0a 0b 0c 0d 0e 0f
00 01 02 03...

See the pattern? A similar pattern continues for all rangeMax values. The masking ends up knocking out the bits that would make up the top and bottom edge half of the results range. So, for 1-20, 5-15 occur twice as often. For 1-100, its 25-75, etc.

This isn't the end of the world, but it makes for odd probabilities, and potentially a bad game feel. Rolling average isn't bad, but when it happens twice as often as really good or really bad rolls, the game is going to feel really "normalized".

At first, I thought I would lean into this. Astute observers will note that in the hex values above, I'm generating numbers of 0-20, which is actually 21 values. The period of these inputs is 32, so the actual probabilities of the values are:

1/32 : 0-4
2/32 : 5-15
1/32 : 16-20

Thinking to favor the player, I figured instead of removing 0, I would just make results of 0 equal to the highest die roll instead -- by having a 2/32 chance to roll the max value, perhaps it would offset the slightly un-randomness of other rolls.

And know what? It would be fine. This actually might be preferable! Average rolls are easier to get, and criticals are just a teensy bit more common (1 every 16 rolls instead of 1/20).

I thought I had it, and I was done. But then I started writing this post. I realized that a multiplication and division really wouldn't be TOO bad, and sketched it out:

DivHLbyBC:
; HL = HL / BC
; clobbers HL, BC, AF
push de 
ld de,0
and 0 
_dsl:
sbc hl,bc ; HL -= BC
jr c,_dds ; if > 0
inc de 
jr _dsl ; inc DE / LOOP 
_dds:
ld h,d 
ld l,e 
pop de 
ret

MulHbyL:
; H * L 
; returns in HL 
push bc 
push de 
; smaller number should be in H
ld d,h 
ld h,0
ld b,0 
ld c,l 
_mhl:
add hl,bc   ; l += l
dec d       ;  h times 
jr nz,_mhl
pop de 
pop bc 
ret 
It might be sloppy or slow, I am very tired, but it seemed simple enough, so I gave it a try. But guess what I noticed?


If you are dividing by 256 (the lowest common)... isn't that just the same as the high byte of the result? For instance, 0x0312 divided by 256 (or 0x100) is 3. 0x41aa divided by 256 is 0x41 (and yes, the low byte is just the remainder).

Wow. I'm glad I tried to write this blog post! I commented out the division, moved H into A, and ran a full series of 0-255 to see the spread.

Inputs 0 - 80:
00 00 00 00 00 00 00 00 
00 00 00 00 00 01 01 01
01 01 01 01 01 01 01 01
01 02 02 02 02 02 02 02
02 02 02 02 02 03 03 03
03 03 03 03 03 03 03 03
03 04 04 04 04 04 04 04
04 04 04 04 04 04 05 05
05 05 05 05 05 05 05 05
05 05 06 06 06 06 06 06 ...
... 200+:
10 10 10 10 10 10 10 10 
11 11 11 11 11 11 11 11
11 11 11 11 12 12 12 12
12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13
13 13 13 13 14 14 14 14 
14 14 14 14 14 14 14 14

About as even as you can ask for! How about that. So this is a little key, then, to convert any base-256 number into any other base! Or in our language, a way to evenly smooth out a number from 0-255 into any smaller number.

rollDie:
    call z80rand 
    ld l,a   ; random result into l 
    ld h,DIESIDES-1  ; sides of die-1
    call MulHbyL
    ; ld bc,256 
    ; call DivHLbyBC
    ld a,h ; skip division! add one!
    inc a 
    ret  

That's it! It's really fast, and has a nice, even spread - even for edge numbers. All that's left is calling it several times for multiple die and add or subtract an arbitrary modifier.

Hope this was interesting, or helps you in some way!

Sunday, August 8, 2021

Megadrive / Genesis assembly - Z80 sound driver example

[ Buildable example written in C: 
https://github.com/bferguson3/m68k-gcc-pi/tree/main/projects/z80test ]

 If you've poked around in the Megadrive/Genesis development community for any period of time, you've probably found the official documentation from SEGA, such as it is. It's very roughly translated, and there are a few errors (if you don't have the errata, uh-oh!).

Well, if you are like me and like doing everything the hard way, this is rough. There aren't any great resources out there for doing low level Megadrive coding, excepting the occasional raw disassembly listings (like Sonic 1 and 2). Driving the sound is especially hard because you need to be roughly familiar with not just 68000, but Z80 as well. 

I scoured around for what I could, but besides some very old code examples (some 'current' links are over 10 years at this point) I remained somewhat baffled. The Genesis Sound Software Manual, in particular, includes a very cryptic "example program" - essentially a listing of the FM register values. 

It goes like this (not quite verbatim):




This is all well and good, but there aren't any code examples given in the manual in _either_ 68000 or Z80 (since you can use either to drive the sound chip - Sonic is a famous case where the sequel changed from a 68k music driver to a z80 music driver). 

The advantage of launching the music player on the Z80 is that the 68000 can be free to do pure graphics processing and won't be caught up on certain hardware hangups, like waiting for the Z80 I/O bus to be free. 

The trick of course, is actually doing it on the hardware.
This stumped me for like two months! Ouch!

So let's dig in. First of all, the manual gives this short description of activating the Z80:

68K CONTROL OF Z-80
Start-Up Operation Sequence:

1. BUS REQ ON
2. BUS RESET OFF
3. 68k copies program into Z-80 S-RAM
4. BUS RESET ON 
5. BUS REQ OFF 
6. BUS RESET OFF 

BUS REQUEST ON
DATA 100H (WORD) -> $A11100

BUS REQ OFF 
DATA 0H (WORD) -> $A11100

RESET Z80 ON
DATA 0H (WORD) -> $A11200

RESET Z80 OFF
DATA 100H (WORD) -> $A11200
 * Requires 26ms

CONFIRM BUS STATUS
bit 0 of $a11100
1 - 68K can access, 0 - z80 is using

From this, we can attempt to write a 68000 program, following the only steps we are given. 

PRGSIZE equ _endprg-Z80PRG
Z80_BUS equ $a11100
Z80_RESET equ $a11200

    org *

move.w $100,(Z80_BUS)
move.w $100,(Z80_RESET)
movea.l Z80PRG,a0
movea.l $a00000,a1
move.l PRGSIZE,d1
.z80copyloop:
move.b (a0)+,d0
move.b d0, (a1)+
subq #1,d1
bne .z80copyloop
move.w $0,(Z80_RESET)
nop
nop
nop
nop
move.w $0,(Z80_BUS)
move.w $100,(Z80_RESET)

Z80PRG: defb etc etc
_endprg:

If you noticed the 26ms wait (this equals out to four NOPs) is actually after the RESET ON and not the RESET OFF call, you win the prize! This was taken from ROM disassemblies. It is unclear to me if RESET ON and RESET OFF are confused in the text (since the values BUS ON/OFF uses are opposite) or if the 26ms wait footnote is just in the wrong place, but either way, think about it logically for a moment.

After you reset a system, it needs a second to catch up. Writing $100 to $a11200 pauses the z80 in a sense - I think of this as entering it into "reset mode". When you write 0 to the reset bus, you are in another sense allowing it to exit "reset mode" and continue normal operation with the new program in memory - but only once it's cleaned up and ready to reset!

Anyway, the above code will certainly get SOME bytes into the Z80's memory, but what do we write?

First, and MOST IMPORTANTLY, the biggest rule of embedded development is DO NOT TRUST RANDOM MEMORY. Why, you may ask? Because of cases like this. The stack pointer on the Z80 could be anywhere. The memory could be clean, or it could be full of random digits that will immediately cause a stack corruption. 

The emulator dgen, one of my favorites for Linux dev and disassembly, does not initialize the Z80's memory the same as actual hardware. This was a pain point for me, because I was getting sound in dgen, but not on my actual Genesis 2. After a while of poking at this and that, I came across these two lines of code in the Z80 init portion of the Sonic 2 disassembly:

    di 

    ld sp, $1b80



Suddenly, everything was clear. I wasn't disabling interrupts (not that I thought that was the issue) but I wasn't setting the stack - I had no idea where it even started, and that's when I realized I wasn't zeroing the Z80's SRAM, either. Doing those two things got me sound on my hardware and inspired this blog post.

SO - The very first thing we want to do, before even thinking about writing to the FM registers, is make sure our Z80 is "sane".

FMREG EQU $4000
FMDAT EQU $4001
DATSIZE EQU ENDFMDATA-FMDATA

org $0

; disable interrupts
di

; clear the stack
ld a, 0
ld de, $1b00
ld b, 0
CLRSTACK:
LD (DE),A
inc de
djnz CLRSTACK

; set the stack pointer
ld sp,$1b80

This does the trick. 1b00-1bff will be set to 0, and the stack will be set to 1b80. The choice of 1b80 was mostly arbitrary - this is what it is set to for Sonic, so why not?

Now we can finally start writing the registers. For the large batch of initialization data (which is listed in the full listing below), its easier to wrap it all in a loop. I chose to store the register byte in B, the data byte in C, and write a loop around calling a separate function to write the register:


LD HL,FMDATA ; LENGTH OF DATA
LD BC,DATSIZE
srl b
rr c ; divided by two!
FMINITLOOP:
PUSH BC ; ++
; Store REG# and DATA in B and C
LD B,(HL)
INC HL
LD C,(HL)
INC HL
; Write FM1, preserving HL
PUSH HL
CALL WRITEFMR
POP HL
POP BC ; --
DEC BC
LD A,C
OR B ; quick check for 16bit 0
JR NZ,FMINITLOOP

General Z80 bits:
- SRL B ; RR C is a quick way of dividing 16bit BC by two. It is a logic shift pair. We divide by two because we are writing in two-byte pairs, so we want half the size of the total data block.
- LD A, C ; OR B is a fast 16bit zero-check. The flag is set from register A, which we compare quickly with both registers.

Here is the fairly easy WRITEFMR function:

;;;;;;;;;;
WRITEFMR:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
; WRITE FM REGISTER
; * A, B, C
; INPUT:
; B = REG TO WRITE
; C = VALUE TO WRITE
call ZWAIT
; REG select
ld a,b
ld ($4000),a
call ZWAIT
; Write DATA
ld a,c
ld ($4001),a
RET

Before writing to the FM in any fashion, whether to the register select port or the data port, we have to wait until the bus is ready. This particular method (add $4000 to itself until it is <255) was taken from disassembly, but presumably BIT 7,a works just as well. 


;;;;;;;;;;
ZWAIT:
;;;;;;;;;;;;;;;;;;;;;;;;;;
; Waits until fm bus is ready.
; * A
LD A,($4000)
add a, a
JR c,ZWAIT
ret


Regardless of the method, I advise writing to $4000/$4001 directly, and not via indirect addressing JUST IN CASE - for no other reason than that's what SEGA does.

Great - now Channel 1 is set up and we can actually play a sound now!


START:
LD BC,$B032 ; feedback/alg
CALL WRITEFMR
LD BC,$B4C0 ; speakers on
CALL WRITEFMR
LD BC,$2800 ; KEY OFF
CALL WRITEFMR
LD BC,$A422
CALL WRITEFMR ; SET FREQ
LD BC,$A069
CALL WRITEFMR

LD BC,$28F0
CALL WRITEFMR ; KEY ON

Using the WRITEFMR method the way I implemented it, the rest of the program is easy to write. The exception comes when we need to wait before turning the key off again. I wrote a basic cycle wait, but you can do whatever you want. 

And that's it!

The main things to watch out for are:

- Wait 4 NOP (26ms) in between writing 0 to z80_reset and $100 to z80_reset

- Clear memory! Watch for interrupts! Set your stack pointer!!

- Wait before _every_single_write to FM and address them directly


Full Z80 source:
FMREG EQU $4000
FMDAT EQU $4001
DATSIZE EQU ENDFMDATA-FMDATA

org $0

; disable interrupts
di

; clear the stack
ld a, 0
ld de, $1b00
ld b, 0
CLRSTACK:
LD (DE),A
inc de
djnz CLRSTACK

; set the stack pointer
ld sp,$1b80

LD HL,FMDATA ; LENGTH OF DATA
LD BC,DATSIZE
srl b
rr c ; divided by two!
FMINITLOOP:
PUSH BC ; ++
; Store REG# and DATA in B and C
LD B,(HL)
INC HL
LD C,(HL)
INC HL
; Write FM1, preserving HL
PUSH HL
CALL WRITEFMR
POP HL
POP BC ; --
DEC BC
LD A,C
OR B ; quick check for 16bit 0
JR NZ,FMINITLOOP

START:
LD BC,$B032 ; feedback/alg
CALL WRITEFMR
LD BC,$B4C0 ; speakers on
CALL WRITEFMR
LD BC,$2800 ; KEY OFF
CALL WRITEFMR
LD BC,$A422
CALL WRITEFMR ; SET FREQ
LD BC,$A069
CALL WRITEFMR

LD BC,$28F0
CALL WRITEFMR ; KEY ON

LD C, 5
CALL WAIT ; SIMPLE WAIT

LD BC,$2800
CALL WRITEFMR ; KEY OFF

LOOP:
JP LOOP ; Done!

;;;;;;;;;;
WRITEFMR:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
; WRITE FM REGISTER
; * A, B, C
; INPUT:
; B = REG TO WRITE
; C = VALUE TO WRITE
call ZWAIT
; REG select
ld a,b
ld ($4000),a
call ZWAIT
; Write DAT
ld a,c
ld ($4001),a
RET

;;;;;;;;;;
ZWAIT:
;;;;;;;;;;;;;;;;;;;;;;;;;;
; Waits until fm bus is ready.
; * A
LD A,($4000)
add a, a
JR c,ZWAIT
ret

;;;;;;;;;
WAIT:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
; WAIT FOR FFFF * C CYCLES
; * A, C, H, L
; INPUT:
; C = NUM LOOPS

BIGLOOP:
LD HL,$ffff ; 64K loops
WAITLOOP:
DEC HL
LD A,L
OR H
JR NZ, WAITLOOP

DEC C
XOR A
OR C
JR NZ,BIGLOOP

RET
;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FMDATA:
defb $22,0 ; lfo and dac
defb $27,0
defb $28,0
defb $2B,0
; 3x channel 0
defb $30,$71 ; dt1/mul
defb $34,$0D
defb $38,$33
defb $3c,$01
; 4x channel 0
defb $40,$23 ; total level
defb $44,$2d
defb $48,$26
defb $4c,$00
; 5x channel 0
defb $50,$5f ; rs/ar
defb $54,$99
defb $58,$5f
defb $5c,$94
; 6x channel 0
defb $60,5 ; am/d1r
defb $64,5
defb $68,5
defb $6c,7
; 7x channel 0
defb $70,2 ; d2r
defb $74,2
defb $78,2
defb $7c,2
; 8x channel 0
defb $80,$11 ; d1l/rr
defb $84,$11
defb $88,$11
defb $8c,$a6
; ??
defb $90,0
defb $94,0
defb $98,0
defb $9c,0
ENDFMDATA: