Saturday, April 8, 2023

Rolling D20s on the Z80; and other random thoughts about randomness

Rolling D20s on the Z80

and other random thoughts

about randomness

My current project uses simulated dice rolls of varying face counts for combat results. I originally was not thinking of targeting 8-bit systems, but obviously I changed my mind. I went through a couple phases finding the best implementation, and I thought it interesting enough to write down!

To start: Generating a random 8-bit number through a linear feedback shift register on Z80 has been well documented.

I use the following method, which is found all over the internet:;James Montelongo ;optimized by Spencer Putt ;out: ; a = 8 bit random number ; clobbers ALL ld hl,_LFSRSeed+4 ld e,(hl) inc hl ld d,(hl) inc hl ld c,(hl) inc hl ld a,(hl) ld b,a rl e \ rl d rl c \ rla rl e \ rl d rl c \ rla rl e \ rl d rl c \ rla ld h,a rl e \ rl d rl c \ rla xor b rl e \ rl d xor h xor c xor d ld hl,_LFSRSeed+6 ld de,_LFSRSeed+7 ld bc,7 lddr ld (de),a ret
0-255? Cake!

But what if we are trying to generate a percentage of 1-100? But what about a d8? A d20? What about a complicated dice roll for our 8-bit OGL 2.0 games?? Well, that's where things get complicated.

If you are only given an input of 0-255, and you have to make that number fit WITH A SIMILAR PROBABILITY into, say, 20 digits (for a 1d20), then suddenly you have some complications.

If the number generated by the LFSR is in between 1-20, its no problem. But that is only 20 out of 256 cases. If the number is, say, 149, what's the fairest way to squash that down into the range we ask for? What if its 0?

A mathematician will tell you to find the lowest common denominator via cross product between the die sides (20) and the result range, 256. e.g.
n/20 x 149/256 == 2980n/256 == (rounded down to) n=11.

149/256 is approximately 11/20, great. Wouldn't this suck to do in assembly though?Before I wanted to try mul8 and div16, which feels like a lot of wasted cycles, I was thinking about various shortcuts that I could use.

My first attempt was to simply srl the value until it fits within the constraints. When discussing this with my sibling, they correctly pointed out that, in effect, this would keep the value in the top half of the range; especially with higher 'die face' values this would skew results far too high. Its super quick, but extremely dirty. Not good for a game.

My second attempt was a similar method of bitmasking:

MAXU8:
ld c,rangeMax 
ld b,0xff
_m:
cp c
ret c ; exit 
srl b 
and b 
jr _m

This masks the input with B each iteration, this time B is shifted right each loop. This provides, surprisingly, pretty good results. With an input of 0...n, and a max of 20, this is what I was getting:

00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 05 06 07 07 08 0a 0b 0c 0d 0e 0f 00 01 02 03...

See the pattern? A similar pattern continues for all rangeMax values. The masking ends up knocking out the bits that would make up the top and bottom edge half of the results range. So, for 1-20, 5-15 occur twice as often. For 1-100, its 25-75, etc.

This isn't the end of the world, but it makes for odd probabilities, and potentially a bad game feel. Rolling average isn't bad, but when it happens twice as often as really good or really bad rolls, the game is going to feel really "normalized".

At first, I thought I would lean into this. Astute observers will note that in the hex values above, I'm generating numbers of 0-20, which is actually 21 values. The period of these inputs is 32, so the actual probabilities of the values are:
1/32 : 0-4 2/32 : 5-15 1/32 : 16-20Thinking to favor the player, I figured instead of removing 0, I would just make results of 0 equal to the highest die roll instead -- by having a 2/32 chance to roll the max value, perhaps it would offset the slightly un-randomness of other rolls.

And know what? It would be fine. This actually might be preferable! Average rolls are easier to get, and criticals are just a teensy bit more common (1 every 16 rolls instead of 1/20).

I thought I had it, and I was done. But then I started writing this post. I realized that a multiplication and division really wouldn't be TOO bad, and sketched it out:

DivHLbyBC:
; HL = HL / BC
; clobbers HL, BC, AF
push de 
ld de,0
and 0 
_dsl:
sbc hl,bc ; HL -= BC
jr c,_dds ; if > 0
inc de 
jr _dsl ; inc DE / LOOP 
_dds:
ld h,d 
ld l,e 
pop de 
ret

MulHbyL:
; H * L 
; returns in HL 
push bc 
push de 
; smaller number should be in H
ld d,h 
ld h,0
ld b,0 
ld c,l 
_mhl:
add hl,bc   ; l += l
dec d       ;  h times 
jr nz,_mhl
pop de 
pop bc 
ret

It might be sloppy or slow, I am very tired, but it seemed simple enough, so I gave it a try. But guess what I noticed?

If you are dividing by 256 (the lowest common)... isn't that just the same as the high byte of the result? For instance, 0x0312 divided by 256 (or 0x100) is 3. 0x41aa divided by 256 is 0x41 (and yes, the low byte is just the remainder).

Wow. I'm glad I tried to write this blog post! I commented out the division, moved H into A, and ran a full series of 0-255 to see the spread.

Inputs 0 - 80:
00 00 00 00 00 00 00 00 
00 00 00 00 00 01 01 01
01 01 01 01 01 01 01 01
01 02 02 02 02 02 02 02
02 02 02 02 02 03 03 03
03 03 03 03 03 03 03 03
03 04 04 04 04 04 04 04
04 04 04 04 04 04 05 05
05 05 05 05 05 05 05 05
05 05 06 06 06 06 06 06 ...
... 200+:
10 10 10 10 10 10 10 10 
11 11 11 11 11 11 11 11
11 11 11 11 12 12 12 12
12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13
13 13 13 13 14 14 14 14 
14 14 14 14 14 14 14 14

About as even as you can ask for! How about that. So this is a little key, then, to convert any base-256 number into any other base! Or in our language, a way to evenly smooth out a number from 0-255 into any smaller number.

rollDie:
    call z80rand 
    ld l,a   ; random result into l 
    ld h,DIESIDES-1  ; sides of die-1
    call MulHbyL
    ; ld bc,256 
    ; call DivHLbyBC
    ld a,h ; skip division! add one!
    inc a 
    ret

That's it! It's really fast, and has a nice, even spread - even for edge numbers. All that's left is calling it several times for multiple die and add or subtract an arbitrary modifier.

Hope this was interesting, or helps you in some way!

Sunday, August 8, 2021

Megadrive / Genesis assembly - Z80 sound driver example

[ Buildable example written in C:
https://github.com/bferguson3/m68k-gcc-pi/tree/main/projects/z80test ]

If you've poked around in the Megadrive/Genesis development community for any period of time, you've probably found the official documentation from SEGA, such as it is. It's very roughly translated, and there are a few errors (if you don't have the errata, uh-oh!).

Well, if you are like me and like doing everything the hard way, this is rough. There aren't any great resources out there for doing low level Megadrive coding, excepting the occasional raw disassembly listings (like Sonic 1 and 2). Driving the sound is especially hard because you need to be roughly familiar with not just 68000, but Z80 as well.

I scoured around for what I could, but besides some very old code examples (some 'current' links are over 10 years at this point) I remained somewhat baffled. The Genesis Sound Software Manual, in particular, includes a very cryptic "example program" - essentially a listing of the FM register values.

It goes like this (not quite verbatim):

This is all well and good, but there aren't any code examples given in the manual in _either_ 68000 or Z80 (since you can use either to drive the sound chip - Sonic is a famous case where the sequel changed from a 68k music driver to a z80 music driver).

The advantage of launching the music player on the Z80 is that the 68000 can be free to do pure graphics processing and won't be caught up on certain hardware hangups, like waiting for the Z80 I/O bus to be free.

The trick of course, is actually doing it on the hardware.
This stumped me for like two months! Ouch!

So let's dig in. First of all, the manual gives this short description of activating the Z80:

68K CONTROL OF Z-80
Start-Up Operation Sequence:

1. BUS REQ ON
2. BUS RESET OFF
3. 68k copies program into Z-80 S-RAM
4. BUS RESET ON
5. BUS REQ OFF
6. BUS RESET OFF

BUS REQUEST ON
DATA 100H (WORD) -> $A11100

BUS REQ OFF
DATA 0H (WORD) -> $A11100

RESET Z80 ON
DATA 0H (WORD) -> $A11200

RESET Z80 OFF
DATA 100H (WORD) -> $A11200
* Requires 26ms

CONFIRM BUS STATUS
bit 0 of $a11100
1 - 68K can access, 0 - z80 is using

From this, we can attempt to write a 68000 program, following the only steps we are given.

PRGSIZE equ _endprg-Z80PRG
Z80_BUS equ $a11100
Z80_RESET equ $a11200

org *

move.w $100,(Z80_BUS)
move.w $100,(Z80_RESET)
movea.l Z80PRG,a0
movea.l $a00000,a1
move.l PRGSIZE,d1
.z80copyloop:
move.b (a0)+,d0
move.b d0, (a1)+
subq #1,d1
bne .z80copyloop
move.w $0,(Z80_RESET)
nop
nop
nop
nop
move.w $0,(Z80_BUS)
move.w $100,(Z80_RESET)

Z80PRG: defb etc etc
_endprg:

If you noticed the 26ms wait (this equals out to four NOPs) is actually after the RESET ON and not the RESET OFF call, you win the prize! This was taken from ROM disassemblies. It is unclear to me if RESET ON and RESET OFF are confused in the text (since the values BUS ON/OFF uses are opposite) or if the 26ms wait footnote is just in the wrong place, but either way, think about it logically for a moment.

After you reset a system, it needs a second to catch up. Writing $100 to $a11200 pauses the z80 in a sense - I think of this as entering it into "reset mode". When you write 0 to the reset bus, you are in another sense allowing it to exit "reset mode" and continue normal operation with the new program in memory - but only once it's cleaned up and ready to reset!

Anyway, the above code will certainly get SOME bytes into the Z80's memory, but what do we write?

First, and MOST IMPORTANTLY, the biggest rule of embedded development is DO NOT TRUST RANDOM MEMORY. Why, you may ask? Because of cases like this. The stack pointer on the Z80 could be anywhere. The memory could be clean, or it could be full of random digits that will immediately cause a stack corruption.

The emulator dgen, one of my favorites for Linux dev and disassembly, does not initialize the Z80's memory the same as actual hardware. This was a pain point for me, because I was getting sound in dgen, but not on my actual Genesis 2. After a while of poking at this and that, I came across these two lines of code in the Z80 init portion of the Sonic 2 disassembly:

di
ld sp, $1b80

Suddenly, everything was clear. I wasn't disabling interrupts (not that I thought that was the issue) but I wasn't setting the stack - I had no idea where it even started, and that's when I realized I wasn't zeroing the Z80's SRAM, either. Doing those two things got me sound on my hardware and inspired this blog post.

SO - The very first thing we want to do, before even thinking about writing to the FM registers, is make sure our Z80 is "sane".

FMREG EQU $4000
FMDAT EQU $4001
DATSIZE EQU ENDFMDATA-FMDATA

        org $0

; disable interrupts
        di 

; clear the stack
        ld a, 0
        ld de, $1b00 
        ld b, 0 
CLRSTACK:
        LD (DE),A
        inc de 
        djnz CLRSTACK

; set the stack pointer 
        ld sp,$1b80 

This does the trick. 1b00-1bff will be set to 0, and the stack will be set to 1b80. The choice of 1b80 was mostly arbitrary - this is what it is set to for Sonic, so why not?

Now we can finally start writing the registers. For the large batch of initialization data (which is listed in the full listing below), its easier to wrap it all in a loop. I chose to store the register byte in B, the data byte in C, and write a loop around calling a separate function to write the register:

        LD HL,FMDATA            ; LENGTH OF DATA

        LD BC,DATSIZE

        srl b 

        rr c            ; divided by two!

FMINITLOOP:

        PUSH BC         ; ++

        ; Store REG# and DATA in B and C

            LD B,(HL) 

            INC HL 

            LD C,(HL) 

            INC HL 

        ; Write FM1, preserving HL

            PUSH HL

            CALL WRITEFMR 

            POP HL        

        POP BC          ; --

        DEC BC

        LD A,C 

        OR B            ; quick check for 16bit 0

        JR NZ,FMINITLOOP

General Z80 bits:

- SRL B ; RR C is a quick way of dividing 16bit BC by two. It is a logic shift pair. We divide by two because we are writing in two-byte pairs, so we want half the size of the total data block.
- LD A, C ; OR B is a fast 16bit zero-check. The flag is set from register A, which we compare quickly with both registers.

Here is the fairly easy WRITEFMR function:

;;;;;;;;;;
WRITEFMR:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
;  WRITE FM REGISTER
; * A, B, C
; INPUT:
; B = REG TO WRITE
; C = VALUE TO WRITE
        call ZWAIT 
        ; REG select
        ld a,b 
        ld ($4000),a 
        call ZWAIT
        ; Write DATA
        ld a,c 
        ld ($4001),a 
        RET 

Before writing to the FM in any fashion, whether to the register select port or the data port, we have to wait until the bus is ready. This particular method (add $4000 to itself until it is <255) was taken from disassembly, but presumably BIT 7,a works just as well.

;;;;;;;;;;
ZWAIT:                 
;;;;;;;;;;;;;;;;;;;;;;;;;;
; Waits until fm bus is ready.
; * A
        LD A,($4000) 
        add a, a 
        JR c,ZWAIT   
        ret 

Regardless of the method, I advise writing to $4000/$4001 directly, and not via indirect addressing JUST IN CASE - for no other reason than that's what SEGA does.

Great - now Channel 1 is set up and we can actually play a sound now!

START:
        LD BC,$B032     ; feedback/alg
        CALL WRITEFMR 
        
        LD BC,$B4C0     ; speakers on
        CALL WRITEFMR
        
        LD BC,$2800     ; KEY OFF
        CALL WRITEFMR 
        
        LD BC,$A422     
        CALL WRITEFMR   ; SET FREQ
        LD BC,$A069
        CALL WRITEFMR 

        LD BC,$28F0    
        CALL WRITEFMR   ; KEY ON

Using the WRITEFMR method the way I implemented it, the rest of the program is easy to write. The exception comes when we need to wait before turning the key off again. I wrote a basic cycle wait, but you can do whatever you want. 

And that's it!

The main things to watch out for are:

- Wait 4 NOP (26ms) in between writing 0 to z80_reset and $100 to z80_reset

- Clear memory! Watch for interrupts! Set your stack pointer!!

- Wait before _every_single_write to FM and address them directly

Full Z80 source:
FMREG EQU $4000
FMDAT EQU $4001
DATSIZE EQU ENDFMDATA-FMDATA

        org $0

; disable interrupts
        di 

; clear the stack
        ld a, 0
        ld de, $1b00 
        ld b, 0 
CLRSTACK:
        LD (DE),A
        inc de 
        djnz CLRSTACK

; set the stack pointer 
        ld sp,$1b80 

        LD HL,FMDATA            ; LENGTH OF DATA
        LD BC,DATSIZE
        srl b 
        rr c            ; divided by two!
FMINITLOOP:
        PUSH BC         ; ++
        ; Store REG# and DATA in B and C
            LD B,(HL) 
            INC HL 
            LD C,(HL) 
            INC HL 
        ; Write FM1, preserving HL
            PUSH HL
            CALL WRITEFMR 
            POP HL        
        
        POP BC          ; --
        DEC BC
        LD A,C 
        OR B            ; quick check for 16bit 0
        JR NZ,FMINITLOOP

START:
        LD BC,$B032     ; feedback/alg
        CALL WRITEFMR 
        
        LD BC,$B4C0     ; speakers on
        CALL WRITEFMR
        
        LD BC,$2800     ; KEY OFF
        CALL WRITEFMR 
        
        LD BC,$A422     
        CALL WRITEFMR   ; SET FREQ
        LD BC,$A069
        CALL WRITEFMR 

        LD BC,$28F0    
        CALL WRITEFMR   ; KEY ON

        LD C, 5
        CALL WAIT       ; SIMPLE WAIT

        LD BC,$2800 
        CALL WRITEFMR   ; KEY OFF

LOOP:   
        JP LOOP         ; Done!

;;;;;;;;;;
WRITEFMR:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
;  WRITE FM REGISTER
; * A, B, C
; INPUT:
; B = REG TO WRITE
; C = VALUE TO WRITE
        call ZWAIT 
        ; REG select
        ld a,b 
        ld ($4000),a 
        call ZWAIT
        ; Write DAT
        ld a,c 
        ld ($4001),a 
        RET 

;;;;;;;;;;
ZWAIT:                 
;;;;;;;;;;;;;;;;;;;;;;;;;;
; Waits until fm bus is ready.
; * A
        LD A,($4000) 
        add a, a 
        JR c,ZWAIT   
        ret 

;;;;;;;;;
WAIT:
;;;;;;;;;;;;;;;;;;;;;;;;;;;
;  WAIT FOR FFFF * C CYCLES
; * A, C, H, L 
; INPUT: 
; C = NUM LOOPS 

BIGLOOP:
        LD HL,$ffff     ; 64K loops
WAITLOOP:
        DEC HL 
        LD A,L 
        OR H 
        JR NZ, WAITLOOP

        DEC C 
        XOR A 
        OR C 
        JR NZ,BIGLOOP

        RET 
;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FMDATA:
        defb $22,0      ; lfo and dac
        defb $27,0
        defb $28,0 
        defb $2B,0
        ; 3x channel 0
        defb $30,$71    ; dt1/mul
        defb $34,$0D
        defb $38,$33
        defb $3c,$01
        ; 4x channel 0
        defb $40,$23    ; total level
        defb $44,$2d    
        defb $48,$26
        defb $4c,$00
        ; 5x channel 0
        defb $50,$5f    ; rs/ar
        defb $54,$99    
        defb $58,$5f 
        defb $5c,$94 
        ; 6x channel 0
        defb $60,5      ; am/d1r
        defb $64,5
        defb $68,5
        defb $6c,7
        ; 7x channel 0
        defb $70,2      ; d2r
        defb $74,2 
        defb $78,2
        defb $7c,2
        ; 8x channel 0
        defb $80,$11    ; d1l/rr
        defb $84,$11
        defb $88,$11
        defb $8c,$a6
        ; ??
        defb $90,0
        defb $94,0
        defb $98,0
        defb $9c,0 
ENDFMDATA:

[ Buildable example written in C: 
https://github.com/bferguson3/m68k-gcc-pi/tree/main/projects/z80test ]

Wednesday, February 3, 2021

[PC88] C framework for the NEC PC8801 - Part 2, Simple graphics

The following project (which is very similar to what is found in the repository) demonstrates the basics of drawing on the PC-88.

#include "pc88-c.h"
#include "img_b.h"
#include "img_r.h"
#include "img_g.h"

PlanarBitmap layeredImage = {
img_r, img_g, img_b, 248/8, 100
};

void main()

{
IRQ_OFF

// Test yellow ALU draw (V2)
ExpandedGVRAM_On();
EnableALU();
SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW);

vu8* vp = (vu8*)0xc100;
*vp = 0xff;

ExpandedGVRAM_Off();

// Toggle, then test blue ALU draw (v2)
ExpandedGVRAM_On();
SetIOReg(EXPANDED_ALU_CTRL, CLR_BLUE);
vp += 0x100;
*vp = 0xff;

// GVRAM copy mode on
SetIOReg(EXPANDED_GVRAM_CTRL, (u8)(bit(7)|bit(4)) );
__asm
ld a,(0xc200)
ld (0xc201),a
__endasm;
// (copies blue byte)
ExpandedGVRAM_Off();

// Planar bitmap (V1) draw and individual pixels
DisableALU(); // ALU off
PlanarBitmap* pb = &layeredImage;
DrawPlanarBitmap(pb, 20, 10);
SetPixel(360, 180, CLR_BLUE);
SetPixel(361, 181, CLR_CYAN);
SetPixel(362, 182, CLR_GREEN);
SETBANK_MAINRAM() // must reset after draw!

IRQ_ON

while(1)

{
Wait_VBLANK();
if(GetKeyDown(KB_RETURN)) print("HI!\x00");
}
}

(Updated 2/5/2021)

At the top, img_b, img_g, and img_r are the resultant const char arrays produced from png288.py (using ishino.png).

These are referenced in the definition of layeredImage, along with its width (divided by 8, for byte width) and height in pixels. This is drawn in V1 mode, described below.

LINE_POINTER and SCREEN_POINTER are initialized to SCREEN_TXT_BASE and 0, respectively. This is done in __init(), within pc88-c.h.

When drawing, we want interrupts disabled, so we call IRQ_OFF

First, we'll try drawing a line of 8 pixels (in VRAM, this equals 0xff) in V2 mode, using the ALU.

!!! IMPORTANT !!!

Before calling EnableALU(), you must first call ExpandedGVRAM_On(). If you enable ALU expanded mode without swapping to expanded GVRAM, the system will think you want to write to Main RAM instead - because expanded mode is off, it defaults to independent mode.

Once that's done, you load your pen color into EXPANDED_ALU_CTRL*. Then any write to 0xC000+ will write that color to the screen, in a linear order of 1bpp pixels - for instance, 0xFF is a string of 8 pixels, 0b10101010 is every other pixel, etc. 0xC000 is the top left single pixel of the screen. (Remember that PC-88 pixels are double-high!)
*See the "deeper explanation" below

When done writing to GVRAM, you need to disable it with ExpandedGVRAM_Off() or SETBANK_MAINRAM(). Expanded mode (aka ALU) writes require the former, independent mode writes require the latter. You won't be able to access the Main RAM page, and overall program performance will decrease, until Main RAM is banked back in.

The "simple" explanation of Expanded mode/ALU writing:

1. Disable IRQ / wait for VBlank
2. Enable Expanded GVRAM access mode (set the high bit of I/O register 0x35)
3. Enable the ALU (set bit 6 of I/O register 0x32, leaving the rest unaffected)
4. Set I/O register 0x34 with the palette number to write
5. Write pixel data to VRAM (0xC000~).
6. Disable Expanded GVRAM access mode
7. Re-enable IRQ

In PC88-C this looks like:

IRQ_OFF
ExpandedGVRAM_On();
EnableALU();
SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW);
vu8* vp = (vu8*)0xc100; // initialize vram pointer somewhere
*vp = 0b10101010; // write every other pixel
ExpandedGVRAM_Off();
IRQ_ON

EXPANDED_ALU_CTRL has a deeper purpose than simply setting the active palette, however. The method used when writing to GVRAM through the ALU depends on the value in I/O register 0x35 (so named EXPANDED_GVRAM_CTRL).

The actual bit definitions are:
EXPANDED_ALU_CTRL (34h)
bit 7 6 5 4 3 2 1 0
GV2H GV1H GV0H GV2L GV1L GV0L

Where GV0, GV1 and GV2 represent each respective graphic VRAM plane, and H and L represent the high and low bits of the following functions for each plane:
H L
0 0 - Bit reset
0 1 - Bit set (OR)
1 0 - Bit invert (XOR)
1 1 - Ignore / noop

Take a moment to understand what this means in practice. If all three of the lower bits are set, that means any bit written to VRAM through the ALU will set the bits on all 3 graphics planes. A white pixel will be written. Hence, loading EXPANDED_ALU_CTRL's lower bits with the palette value has the same effect as changing the active pen color.

Loading only the upper bits of the register with a palette value will flip the bits on that plane. SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW << 4), for instance, will change black pixels to yellow, and yellow pixels to black (values 000 ^ 110). To determine what other colors would change to, you would have to XOR the present color on that plane. This has limited application - one, to erase data you know is already there, and two, to perform quick palette swaps.

Loading both the upper and lower bits prevents the ALU from writing to that plane. If writing to GVRAM is not working when ALU is enabled, ensure I/O register 0x34 is set to the proper value.

EXPANDED_GVRAM_CTRL (35h)
bit 7 6 5-4 3 2 1 0
GAM - GDM - GC RC BC

GAM : 0 - enable Main RAM, 1 - enable GVRAM
GDM: Graphic Data Multiplexer control
0 0 - ALU write via port 34h
0 1 - Write all planes via VRAM copy
1 0 - Transfer plane R to plane B*
1 1 - Transfer plane B to plane R*
GC, RC, BC - Bit comp data for each of the 3 GVRAM planes**

*Used in hi-res B&W mode
**Used in multiplexer modes 00 and 01

Generally, this register will be at 0x80 when writing to GVRAM and 0 during normal program execution. When bit 7 is set, data that is loaded into the accumulator from GVRAM is not actual data, but instead as I understand it, a VRAM buffer. What is written is also not actual data, but it instead moves that buffer to the desired location. (On this point it is difficult to find a detailed explanation).

The bottom 3 bits of this register act as a mask when loading ALU data from VRAM. Mind that the value loaded into the accumulator is _not_ what is written to VRAM, regardless of the mask. This is for arithmetic operations only, i.e. determining the color that needs to be written to the ALU to change the pixel to a specific color. (The mechanism to do this at the moment is beyond me)

V1 Mode - Pixel and PlanarBitmap draw:
You can mix V1 and V2/ALU draw modes, as shown in the main.c example. To write V1 mode graphics (independent GVRAM plane access):

DisableALU();
Turn off the ALU, if it is on.

PlanarBitmap* pb = &layeredImage;
DrawPlanarBitmap(pb, 20, 10);
If you are drawing a bitmap, initialize a pointer to the defined struct.
This is a macro that (over)writes the bitmap on all 3 VRAM planes.

SetPixel(360, 180, CLR_BLUE);
Paints an individual pixel.

SETBANK_MAINRAM()
DrawPlanarBitmap() and SetPixel() both change the active page of memory bank 0xC000~0xFFFF to the corresponding RGB plane. After calling them, you must call SETBANK_MAINRAM() before returning to normal program execution.

And that's all there is to it! V1 mode is simple, but it is ineffective and slow - V2 is more complex, depending on your needs, but can cut down render time by over half.

The code above in the example paints a bitmap in V1 mode and several methods of plotting pixels in V2 mode.

Part 3 will detail V1 mode's layered palette technique, software sprites, and more.

[PC-88] C framework for the NEC PC8801 - Beginnings

So, you wanna make a game for an old Japanese computer virtually nobody in the west cares about?

Welcome!

I built this from my own knowledge of Python, C, and z80 assembly. (Thanks to the SDCC core!) I had to start from scratch, but a very large portion of the knowledge I gained came from these two Japanese-only sites on PC88 assembly:

Bookworm's Library

Maroon Youkan

I will attempt to translate and explain what I've learned as best I can. If you have questions and are serious about PC-88 C development, consider joining the @RetroDevDiscord and helping the few of us there are.

I use the Windows 10/x64 distribution of SDCC, version 4.0.0. Details on the build process further down.

PC88-C on Github

PC88-C base file list:

src\main.c
src\pc88-c.h
tools\hex2bin.py
tools\hexer.py
tools\maked88.py
tools\png288.py
ipl.bin
*makepc88.bat

(*makepc88.bat is the primary build script.)

MAIN.C
As with most C projects, main.c contains the project code. SDCC generally only likes one primary C file at a time, so try to put all your code in main.c or in files included by main.c.

Due to my unfamiliarity with the inner workings of SDCC, void main() must be the first actual code entry in the built file. The crt0-equivalent, IPL.BIN, points directly to code start at $1000, which is where the autoloader targets (further information below).

PC88-C.H
Most of the command documentation, explanation on registers, etc. is in this file. There is still a lot that needs to be documented, but a lot of information is here. Overview:

These are fairly self-explanatory. String should allow you to define character arrays as you are accustomed. bit() allows you to get bit values without doing the hex in your head.

PlanarBitmap is a bitmap definition required for DrawPlanarBitmap(). 'r', 'g' and 'b' are pointers to raw image data for each of the color planes, and 'w' and 'h' are width (in tiles, or in pixels divided by 8) and and height (in pixels).

The PC-88's video memory is divided into three, one-bit RGB color planes. The original PC-88 models, referred to as 'V1' mode, had to access the RGB planes independently, one at a time. The resulting 3-bit color is displayed on the screen. By default, plane 0 is blue, plane 1 is red, and plane 2 is green. So, to get a "cyan" color, you write a 1 to the blue and green planes, and a 0 to the red plane.

You can change the default palette values for certain effects, such as simulated foreground- and background-layers by limiting overall palette color. This technique is explained in part 3, and functions well with 1-bit software sprites.

However, given the 4MHz CPU speed of the V1 models and the relatively slow bus speed, this is not effective for high fidelity games. V2 mode came not much later, and the ALU expansion module.

The ALU (Arithmetic Logic Unit, thanks rikkles) is the key to improving game performance. With the control of three extra registers, you can write to all 3 VRAM planes simultaneously. The explanation given of utilizing the ALU was extremely complex and took me several days to fully understand, so don't feel bad if it doesn't click right away.

Before getting into that, here are the function outlines:

static inline void putchr(u8 c) (and putchr40)
Puts a character on the screen at location global SCREEN_POINTER. SCREEN_POINTER must be initialized in main() at SCREEN_TXT_OFFSET (or your desired location) before using this function or print().

void print(String str) (and print40)
Prints a string to SCREEN_POINTER

u8 ReadIOReg(u8 r)
Returns the value in a given I/O register, if that register can be read. See the H file for detailed definitions.

void SetIOReg(u8 r, u8 v)

Sets the value of I/O register 'r' to single byte value 'v'.

void SetTextAttribute(u8 x, u8 y, u8 attr)
Adds an attribute byte-pair to row 'y', beginning at position 'x'. The text attribute macros are explained below.

void ClearAttributeRam()
Resets the entire screen's text attributes to their default (80/Color mode) defaults.

void SetCursorPos(u8 x, u8 y) (and SetCursorPos40)
Moves SCREEN_POINTER to 'x', 'y' on-screen, where x=(0, 79) and y=(0, 25)

void Wait_VBLANK()
Proper Vblank ASM routine. Waits for VBL signal from the CRTC, then waits until its clear before returning to ensure we are *inside* vertical blank.

void DrawPlaneBMP(const u8* img, u8 plane, u16 x, u16 y, u8 w, u8 h)
Draws raw, single color plane image data 'img', to color plane 'plane', of width/8 and height 'w' and 'h', at pixel offset 'x', 'y'. Warning, not pixel perfect - x offset is tied to 8-pixel boundary. This is macroed three times with DrawPlanarBMP, defined below. This will toggle the GVRAM planes for you, but SETBANK_MAINRAM() must be called afterwards to re-enable the Main RAM page.

void SetPixel(u16 x, u8 y, u8 c)
Pixel-perfect plot a single pixel at 'x', 'y' of color 'c', where bits 0-2 of 'c' represent the VRAM color planes. Default colors are macroed with the prefix CLR_.
SetPixel() will toggle GVRAM dependant on the color, but SETBANK_MAINRAM() must be called afterwards to re-enable the Main RAM page. Note that to access individual GVRAM pages, expanded GVRAM must be off.

bool GetKeyDown(u8 SCANCODE)
Returns true if the macro 'SCANCODE', prefixed by KB_, is presently down; else returns false.

static inline void EnableALU()
static inline void DisableALU()

Sets I/O register 0x32 to 0xC9 if enabled and 0x89 (IPL defaults) if disabled. Must be called after calling ExpandedGVRAM_On(). (V2 only)

static inline void ExpandedGVRAM_On()
static inline void ExpandedGVRAM_Off()
Sets I/O register 0x35 to 0x80 if enabled and 0 if disabled. This is required before calling EnableALU(), otherwise writing through V2 mode (via the ALU) will not work. V2 only. Note that on boot, Expanded GVRAM is off.

void DiskLoad(u8* dest, u8 srcTrack, u8 srcSector, u8 numSecs, u8 drive) __naked
Same assembly routine as is in IPL.BIN. e.g.
DiskLoad((u8*)0x4000, 1, 7, 40, 0);
Loads 40*256 bytes from track 1, sector 7, to RAM at 0x4000.

void __init()
Sets up the screen pointer and calls main(). Should generally be left alone. :)

Macro definitions:

#define SetBGColor(c) SetIOReg(0x52, c << 4);
#define SetBorderColor(c) SetIOReg(0x52, c); // PC88mk2 and prior
Sets the background color for color text mode. Border color function was removed from most later models.

#define SETBANK_BLUE() SetIOReg(0x5c, 0xff);
#define SETBANK_RED() SetIOReg(0x5d, 0xff);
#define SETBANK_GREEN() SetIOReg(0x5e, 0xff);
#define SETBANK_MAINRAM() SetIOReg(0x5f, 0xff);
Toggles GVRAM banks over 0xC000 ~ 0xFFFF. SETBANK_MAINRAM() must be active during normal program execution - having any VRAM bank active can slow programs down.

#define DrawPlanarBitmap(pb, x, y)
Macro for drawing a PlanarBitmap struct (V1 mode).

#define COLORMODE_SET(color, semigraphic)
Defines a SET type attribute in Color Text mode, where 'color' is 0-7 and 'semigraphic' is 0 or 1.

#define COLORMODE_ATTR(underline, upperline, reverse, blink, hidden)
Defines an ATTR type attribute in Color Text mode, where all parameters are either 0 or 1.

#define BWMODE_ATTR(underline, upperline, reverse, blink, hidden)
#define ATTR_BW_SEMIGRAPHIC 0b10011000
Defines an attribute for B&W Text mode. For B&W mode, use the ATTR_BW_SEMIGRAPHIC macro to enable semigraphic mode.

#define IRQ_OFF __asm di __endasm;
#define IRQ_ON __asm ei __endasm;
#define HALT __asm halt __endasm;
#define BREAKPOINT HALT
Convenience macros. When swapping RAM/GVRAM pages and reading/writing IO ports, remember to disable IRQs. HALT and BREAKPOINT are simply easy ways to aid in debugging without a full debugger.

HEX2BIN.PY
Takes the place of hex2bin.exe. This is taken from Intel's official open source library. Requires Python 3 and python module "intelhex", obtainable via 'pip install intelhex'. This is already integrated in makepc88.bat.

HEXER.PY
Simple tool of convenience - overwrites 1 byte in the given file with the given value, e.g.
python3 tools/hexer.py ipl.bin 0x2f 0x50
Will change the number of sectors loaded by the autoloader in IPL.BIN (the byte located at 0x2f) to 0x50. This tool is not utilized in the chain, but is there for ease of use.

MAKED88.PY
Replaces the eponymous D88SAVER.EXE. D88SAVER was taken from the above websites and was useful in generating a blank d88 file, and injecting files into it (including IPL.BIN). This serves the same purpose, and is used in the same way - with the added feature that it will create the disk file passed as argument if it does not already exist.
(Note this only supports 2D, or 375kB disks for now.)
This is integrated into makepc88.bat.

PNG288.PY
Converts a standard indexed PNG to its corresponding R, G and B bitplanes, then writes them to C-style header files in const char array format. The header file byte data can then be drawn directly by DrawPlaneBMP(). Requires the modules 'Pillow' and 'numpy'.

IPL.BIN
This file is assembled using ASW assembler from ipl.z80 and disk.z80. It contains a short routine that sets up the screen and stack pointer, and has a minimal disk access routine for loading from floppy.

Boot process on the PC-88 is roughly:
- Is the 'boot from floppy' dipswitch on?
-- If YES, copy the 256 bytes from cylinder/head/record 0/0/1 from the inserted disk into RAM at 0xC000. Then, jp $C000.
--If NO, check the TERMINAL and BASIC dipswitches and boot to ROM.

The 256 bytes within IPL.BIN are therefore called from within RAM at org $C000. This is clearly not enough to run an entire game, so the routine then copies N bytes, the value of which is located at offset [0x2F] in the IPL, to the location at offset [0x38-0x39].
By default, [0x2F] = 0x4F and [0x38-0x39] = 0x00 0x10, meaning 79 sectors (2f = 79 * 256 bytes, or ~20kB) is copied from disk to RAM starting at little-endian address $1000. Feel free to change these using hexer.py if you don't have ASW to re-assemble.

MAKEPC88.BAT
Performs the following:
- Deletes app.d88
- Creates a blank 2D d88 with maked88.py
- sdcc -mz80 --code-loc 0x1000 --stack-loc 0x0080 --data-loc 0x0100 --fomit-frame-pointer --no-std-crt0 src/main.c
- Converts the resulting IHX to BIN format using hex2bin.py
- Inserts IPL.BIN and the resultant MAIN.BIN into app.d88 at sectors 0 0 1 and 0 0 2
- Prints a (rough) outline of the memory map
- If "main.bin" (or the resultant filename) is passed as an argument, it will inform you if the file is larger than the default number of sectors copied in by the autoloader. (change line 6 in the bat, set usedsec=nn if you change this value in the IPL).
- Launches the emulator (you must change this to your own emulator path).

This finalizes the explanation of the files included in the repository.

Part 2 will cover basic text, pixel and bitmap drawing in both V1 and V2 modes!