Friday, May 15, 2020

Love2D - Simple event stacks with Lua

Love2D and its 3D/VR companion LOVR are great. I won't blab about how awesome they are - though having an entirely open framework means certain things must be built from scratch. One such thing is an event handling system.

Engines like Unity use a class inheritance to handle this. Every object in a scene is a GameObject, which has an inherited update method to process itself every frame.

It's possible to do this without much trouble by architecting all of your game entities in a similarly OOP way, but this isn't always intuitive, and can cause unnecessary headache and overhead if your game isn't overly complex, or you want more manual control over your event stacks.

Here's a super simple event stack example using anonymous functions and an event stack table (named 'queue'):

table.insert(queue, function() <code> end)

The most frequent use would likely be to add a global wait in between code blocks:

table.insert(queue, function() wait = 1 end)

This also makes calling functions with parameters and so forth very simple:

table.insert(queue, function() 
        sfx:play()
        ComplexFunction(a, 'b', { c = 0 }) 
    end)

Then in update:

love.update(TimeDelta)
    if wait > 0 then
        wait = wait - TimeDelta
        love.draw() -- Continue to draw, but don't process stack
        return
    end
    if #queue > 0 then
        if type(queue[1])=='function' then
            local f = queue[1]
            table.remove(queue, 1)
            f()
        end
    end
end

This code is the basis for most of the animation in my game, or when there needs to be a timed wait e.g. to suspend input tracked by variable named inputEnabled for one second:

function q(o) table.insert(queue, o) end
function setinput(tf) inputEnabled = tf end
q(function() setinput(false) end)
q(function() wait = 1 end)
q(function() setinput(true) end)

Lua allows lots of room for freedom in styling your code however you wish.

Sunday, March 8, 2020

Multi-cart data storage on Pico-8

If you've played with Lexaloffle's Pico-8 for a little while, the limitations of the cart storage - not for graphics or sound, but for code and raw data (esp. tokens) - become a bottleneck very quickly.

Multiple cart support has been added to emulate a form of bank-switching, but it is implemented in a way that purposefully blocks your ability to write more code. The memory locations 0x4300 to around 0x6000 cannot be READ or WRITTEN - this is fairly illogical, because memory locations that cannot be either read or written can't really exist. 

You can, however, repurpose cartridge data to store byte data you create - you just have to know how to store it. The data in the cartridge is effectively hex strings in a specific order. Knowing this, we can write a quick tool to convert data we want to store into Pico-8's cartridge text format. 

We can then read it into the fairly large "user data" area of RAM at 0x4300 (in cartridge, this contains our code) and use it as we will. Loading takes a second, so you probably want to load in as much data as you can at once (i.e. entire towns, etc).

You can programatically store all sorts of data, and use your original cart as a sort of kernel. It will certainly be tricky, and games still won't be EXTREMELY complicated (as is the point of the engine), but having more storage is KEY to making complete games!

As a test, I wrote a text file (i.e. ascii-encoded string bytes) and, using a quick Python script, I converted it to a Pico-8 cart.

Pico-8 Cartridge Text Format:

pico-8 cartridge // http://www.pico-8.com
version 18
__lua__
--Data stored here is inaccessible from the main cart.
--Use this area to describe the stored data instead.
__gfx__
--Data stored here begins at 0x0000 and goes to 0x1fff. 
--It is stored in .p8 as a BACKWARDS hex string, 128 chars by 128 rows.
--e.g. HELLO = 8454c4c4f4 
__gff__
--Data stored here is from 0x3000 to 0x30ff.
--Its format is the same as the gfx section.
__map__
--Data here is 0x2000 to 0x2fff
--It is stored as a normal hex string, 256 chars by 32 rows.
--e.g. HELLO = 48454c4c4f

The three sections above will give you 12,543 bytes of storage per cart, less if you use them for actual graphics and maps. Multiply that by 15 possible storage banks gives you 1.8 megabytes of non-standard storage, and that doesn't include sfx and music!

As a note:
The __sfx__ and music blocks are less easy to make use of. A typical sfx test string looks like this within a .p8 file:
000201003f0503f0503f0503f0503f050...
But when you peek the first 10 bytes of SFX ROM @ 0x3200, the values returned are:
63 10 63 10 63 10 63 10 63 10
3f corresponds to 63, then there are 3 characters in between (050) that equal 10 in decimal. Storing and retrieving data from a format like this may be too inefficient or impractical.

In Python, converting byte data to a hex string is fairly easy:

file = open("input.bin", 'rb') # Data to convert
by = file.read()               # Read all at once
file.close()                   # Close i/o stream
bstr = hex(by[0])              # First byte to hex string
byh = bstr[2]                  
byl = bstr[3]
outbyte = byl + byh            # Rearrange the characters

Iterate the above and paste it into a cart file - then by reading location 0x0000 of the new file (if located under __gfx__), you can convert to string data and print it:


The base cart just does this:

reload(0x4300,0,250,"test.p8")
ts=""
for i=0,250 do
 c=chr(peek(0x4300+i))
 if c=='\\' then
  ts=ts..'\n'
 elseif c~=nil then
  ts=ts..c
 end
end
cls()
print(ts)

(chr() function is defined in the link above). The if block converts any backslash found in the data to a newline character. 

The peek and poke in the screenshot show that the string is actually living in user RAM.

My python tool is very messy (as mine always are!) but it will generate a full cartridge file, warn you if your input data is too large, and fill out all rows to the proper length. You can check out the source here.

Tuesday, February 11, 2020

ZX Spectrum: Detecting in assembly 48k or 128k model

Detecting machine capabilities is just a matter of course in the MSX world. However, in Spectrum land, there wasn't much crossover with 48 and 128k games. Many just came on seperate tapes (or seperate sides of the tapes) and did not share code.

Some cleverly programmed ones, like Avenger, could detect and run the proper loader.

I tried to disassemble Avenger, but either the dump was bad (it wouldn't load the 128k version) or it uses some trickery I couldn't read. Either way I gave up and searched for my own way.

I couldn't find any discussion on this topic on the net, so I was left to my own devices. I came to realize one clear benchmark for 128 machines is the AY chip. As far as I can tell, no 48k machines had one, and every 128k machine did. Perfect!

Well, I tried a routine that polled the AY I/O port, but it doesn't seem to work. What I did not know is that unbound I/O ports will return floating values - about half the time it returns the value you're checking it against. This makes for very unreliable testing.

The other option is memory paging. I THINK this is what Avenger does - it definitely changes the ROM page to the 48k ROM. I did the following instead:


1. Switch the ROM to page 0 - this is never the 48K ROM on any system, and this code will do nothing on a 48K.

2. Read a byte from the ROM I know is only in 48K - The letter "1" from the string "(C) 1982 ..." should work. There is only one version of the 48K ROM, so unless there's something wrong with the system or emulator, this location in RAM (0x153b) should ONLY return '1' on a 48K system.

3. Compare against 0x31 ("1"), and if it differs, we must be on a NON-48K system. In other words, a 128K system (or a 16K, but hopefully nobody will try to run a 48/128 game on a 16K system).

The code looks like this:


As a side note, a secondary check if you REALLY want to make sure you're not on a 16K should be fairly trivial - just find a string byte that is only in that ROM.

Since I can't find any info on this subject, anyone more knowledgeable is welcome to provide alternate solutions - but for now I like this one.

Side note, the gorgeous color scheme is Cobalt in gedit plus the z80 highlight scheme I found on ticalc.org. (install it to a -3.0 folder, not 2.0 like the Readme says).

Saturday, February 8, 2020

The super annoying Speccy VRAM map and pattern printing

The common way to explain the layout of the ZX Spectrum's pixel orientation on its bitmapped VRAM is often quite convoluted and is oriented towards the values of each bit of the VRAM address - useful for plotting single pixels, but not for batch operations.

The Speccy VRAM can be visualized in a few ways to help understand how it's laid out:

1) Similar to an MSX, the ZX has 3 sets of 256x8x8 blocks arranged in a 32x24 grid. From $4000-$47ff is the first set, $4800-$4fff is the second, and $5000-$57ff is the third.

2) Pixel data is oriented in VRAM as if it were a 2048x24 bitmap (with each byte representing 8 pixels for 256x24 bytes), then the 8x8 tiles were scrunched into 256x192.

ONE PIXEL DOWN:
Add 1 to H, every 8 add 32 to L and reset H.
  (if L rolls over, add 8 to H.)
EIGHT PIXELS RIGHT:
Add 1 to L.

This layout can do a couple things with the target VRAM address:

1. inc l will increase the pixel X position across 8 rows (256 bytes per page / 32 columns = 8 rows)
2. inc h will increase the pixel Y position within the first 8 rows, plus the row offset from the l register.
3. Flooding VRAM with patterns is really easy and fast:

    ld hl, $4000    ; VRAM base
    ld b, 12        ; 2 rows per loop * 12 = 24 rows

.printloop:

    ld a, %01010101  ; pixel pattern row 1
  .loop_a:    
    ld [hl], a       
    inc l             
    jr nz, .loop_a   

    inc h            
    
    ld a, %10101010  ; pixel pattern row 2
  .loop_b:    
    ld [hl], a
    inc l 
    jr nz, .loop_b

    inc h           

    dec b
    jr nz, .printloop



ZX Spectrum: 1942 loader detokening and .TAP format assembly

.TAP format is much easier to work with than .TZX, which seems to mainly be for duplication.

.TAP structure is simply a series of headers and file data to create a file listing. Header-data, header-data, header-data. A header is always 19 bytes long, but the length of the data block can be up to 64k.

Visualized, it looks like this:
|----------------------|
|   TAP block header   |
|----------------------|
|                      |
|    Header data       |
|       (19 bytes)     |
|                      |
|---<Checksum byte>----|
|----------------------|
|   TAP block header   |
|----------------------|
|                      |
~     Data bytes       ~
|                      |
         ...
|                      |
|---<Checksum byte>----|
|----------------------|
         ...
for each file on a tape.

Each header and data block has its own 3-byte mini-header as specified by the .TAP format. It's very simple:

; 3 byte block header:
DW BlockSize
DB BlockType
; then data
;  (...) followed by 
DB ChecksumByte

If the BlockType is 0x00 (indicating a header block), then BlockSize will always be 19 (in sequence 13h 00h). Headers are 17 bytes long, and the BlockType and ChecksumByte are added to the BlockSize length to get 19. 
If BlockType is 0xFF (255 or -1, indicating a data block), then BlockSize is the size of the data block (plus two bytes for BlockType and ChecksumByte).

Header blocks look like this:

DB FileType
DS "FILENAME  "
DW DataSize
DW Parameter1
DW Parameter2

FileType can be 0, 1, 2 or 3. BASIC data can be stored as types 1 or 2, but we are concerned with type 0 -- BASIC program -- and type 3 -- CODE (aka assembly).

The filename must be padded with 0x20 to 10 bytes.

DataSize is the size of the data to load. All I know is that this value is generally 2 less than the BlockSize in the following data header.

When FileType is BASIC program (0):
Parameter1 is the LINE parameter when SAVEing the program. I actually have not gotten this to work, and since 1942 keeps it at 0, I do as well.
Parameter2 is the location of the start of the working area of BASIC variables. A bootloader generally does not have variables, so in these cases, this value is actually the same as BlockSize (or, DataSize+2).

When FileType is CODE (3):
Parameter1 is the target memory address (e.g. the first parameter after CODE), and
Parameter2 is ALWAYS 8000h. Not explained why.

And finally, the checksum byte. This isn't a checksum per se, so much as it is a bit toggling of all the bytes in the block, minus the header (including FileType). Start with the FileType flag byte and xor it with each successive byte, then store the final result in ChecksumByte.

For the data block, this is calculated for me using a Python script post-assembly with the following code:

chk = 0xff  # start with flag byte
i = -1      # which is one byte behind
while i < len(inbytes)-1:
    chk = chk ^ inbytes[i+1]
    i += 1

And of course data blocks are simply raw data.

The trick was getting a BASIC stub to auto-run when you play the tape (harder than it seems when you're doing all the bytes by hand) and have that stub clear RAM and load/run the assembly program we want.

I couldn't figure out how to save a .TAP from the "speccy" emulator, so I had no choice but to open up a .TAP of 1942 and see what was up.

The first TAP header block in 1942 looks like this:
DW $0013          ; size in bytes
DB 0              ; type 0 = header
; then the header data:
DB 0              ; 0 = BASIC program
DS "1942      "   ; filename
DW 185            ; file size
DW 0              ; autostart line
DW 185            ; basic vars loc
DB $0f            ; checksum byte

Then, the data block. This is where I had to detokenize the program by hand, and figure some stuff out for myself.

The listing of the 1942 loader ended up looking like this:
10 BORDER 0:POKE 23624,0:POKE 23693,0:CLEAR 25592:POKE 23739,111 
20 LOAD "" SCREEN
30 LOAD "" CODE
40 POKE 23739,244:RANDOMIZE USR 25593
50 REM etc

1942 loads itself into $63f9 - contended memory, but a good starting point all the same.
As a point of interest, 23624 ($5c48) is BRDCLR, 23693 ($5c8d) is ATTR_P, and 23739 ($5cbb) is CURCHL. These correspond to a border color mirror, an attribute byte I need to investigate, and the currently selected IO channel.
This, along with the .TAP disassembly, was enough to get me started -- the basics are use CLEAR n-1, LOAD "" CODE, and RANDOMIZE USR n.

(Note that the best way to check the value of a token in any native BASIC version is to use PRINT CHR$(n). BASIC tokens don't overlap the standard ASCII byte space, so n is almost always > 127.)

First, explaining the Spectrum BASIC line format:
DW LineNo        ; Big-endian!
DW LineOffset    ; Bytes until next LineNo
( ... )          ; (listing)
DB $0d           ; endline

The important thing here is that the single 0x0d byte represents endline in Spectrum BASIC. ZX80/81 use a different endline (0x76, maybe?). Don't look for 00 00 as endline or 00 00 00 for EOF like on other systems - afaict there is no concept of EOF in ZX BASIC.

A large difference between Sinclair and other BASICs is that Sinclair wastes a ton of space on storing numbers as strings, but condenses all spaces automatically. Here is the hex listing for my very short loader program:

13 00 00 00 4C 4F 41 44 45 52 20 20 20 20 2C 00
00 00 2C 00 11 2E 00 FF 00 0A 0D 00 FD 32 35 35
39 32 0E 00 00 F8 63 00 0D 00 14 05 00 EF 22 22
AF 0D 00 1E 0E 00 F9 C0 32 35 35 39 33 0E 00 00
F9 63 00 0D 70

And the corresponding BASIC:

10 CLEAR 32767
20 LOAD "" CODE
30 RANDOMIZE USR 32768

CLEAR 32767 This sets BASIC's HIMEM to 7fffh. Doing this tells the ZX that the next time we load bytes from tape, they should go to the byte after this address (8000h).
LOAD "" CODE This is equivalent to "Load the next file available from tape as an assembly program (to the lowest point in memory I've allotted)". This will load the next chunk of data pointed to by a .TAP header in the .TAP file as a binary to 8000h.
RANDOMIZE USR 32768 This is, for some reason, the common way to start machine language routines on the speccy. This is equivalent to "JP $8000".

The tricky part here is that immediately after string numerical constants (which are stored as ASCII), they are followed with byte 0x0e (integral modifier byte) and then stored as:
DB 0
DB PolarityByte
DW IntValue
DB 0
Such that 25592 becomes 11 bytes(!!):
32 35 35 39 32 0e 00 00 f8 63 00

Also, the final line, RANDOMIZE USR n, is the sequence of bytes f9 c0. Whenever you see this in Spectrum BASIC its a CALL/JP command.

As mentioned above, I wrote a Python script to calculate the checksum bytes for me. Running it on the BASIC stub binary and a compiled asm binary I resulted in two .TAP files: one for the BASIC loader, and one for the hello world program.

.TAP is brilliant because you can $cat a.tap b.tap > ./c.tap and suddenly have a complete tape file. I tested b.tap with this binary code, assembling as-is with nothing else:

%org $8000

Init
    xor a 
    ld [WorkRAM_a], a
Loop:
    ld a, [WorkRAM_a]
    inc a 
    cp 8
    jr nz, .ok
     xor a
    .ok:
    ld [WorkRAM_a], a
    out [ZX_IOPORT], a
    jp Loop


WorkRAM_a: rb 1

And it worked! With a little bit more work and bash nonsense, I have a one-click script that will assemble a ZX Spectrum .TAP image (with loader!) to an address I specify from a single assembly listing.


Comment if you are interested in learning more.

Friday, January 31, 2020

Wait for VSYNC in DOS (inline assembly)

#define VID_STATUS 03dah      
#define VSYNC_MASK 00001000b  

int main()
{
    while(1)         // forever loop
    {
        DrawWait:
        asm
        {
            mov dx, VID_STATUS
            in al, dx
            test al, VSYNC_MASK 
            jz DrawWait     
        }
        draw(); 
    }
    return 0;
}

This method is mostly elegant, but depending on your game and environment, you may need a second wait for the z flag to be set to ensure the draw routine only runs once. 

There IS another method (thanks: michaelangel007), perhaps not as great for games, but could be used for a variety of other purposes, and that is to reprogram the timer for a value other than 18.2/s. 
Reference link

Thursday, January 30, 2020

Reading keyboard input in DOS w/ Turbo C++ and assembly (Jill of the Jungle, Wolf 3D)

It's fairly easy to find a working copy of Turbo C++ 3.0, and it compiles and links .EXEs without issue in DOSBOX. However a couple problems are immediately apparent.

The CLK_TCK constant is set at install time, and is set to 18.2 on my copy. CLK_TCK being accurate is necessary for clock() to get an appropriate reading -- which it cannot. Since it's impossible to determine a safe value, there needs to be a better way, across all CPU speeds, to measure time at a higher resolution.

Obviously there has to be at the machine level, but there are no higher resolution time-monitor-thingies in TC++3, so that leaves a raw assembly block. HOORAY!! This *IS* actually good news. DOS machines have a standardized BIOS, and DOS runs the x86 instruction set. Any DOS machine anywhere will be able to run our machine language routines.

This is way easier than writing assembly routines that work across all Windows and UNIX systems. Blegh.

***PSEUDOMATH INCOMING***

Cycle rate in DOSBOX is how many instructions to attempt to perform *every millisecond*. This is not exactly analgous to cycles/second. If you are targeting a certain CPU, like I am (a 16MHz 386 sounds appropriate), then you should look up how many MIPS that CPU can perform and divide it by 1000. For my purposes, 2000 cycles (or approximately 2 MIPS) is roughly the 16MHz 386DX target -- the top CPU of 1985. In my DOSBOX config, I set CPU type to 386 and cycles to fixed 2000.

Now that my CPU speed is set properly, let's spit out a quick program that will hook into the keyboard interrupt -- note this is THE ONLY ACCEPTABLE WAY to read keyboard input!!

#include <iostream.h>
#include <dos.h>

unsigned char KEYSCAN;

void interrupt ( *oldkb)(...);

void interrupt newKb(...)
{
    asm
    {

    }
}

int main()
{
    while(1)
    {
        cout << KEYSCAN;
        if (KEYSCAN == 0x01) { return 0; }
    }
    return 0;
}

Now, what we want to do is detect whenever a key is pressed or released. The keyboard controller will send an interrupt (in DOS this IRQ is vectored at 09h), and we are expecting a value of 0-255 (the keyboard scan code sent by the peripheral control chip), which we will output using cout as a test. 

DOS standard uses (apparently) either the 8255 PPI, or in the case of PS/2, the Intel 8042. 
I don't know when or what software would actually address the 8042 natively - all I can find regarding DOS keyboard input doesn't seem to ever use 64h (the PS/2 port), only 60/61h (for the 8255). 

Desiring more information, I began to disassemble Jill of the Jungle to figure out how EPIC did keyboard input back in the day. They do indeed use an IRQ hook - they check if the scancode is between 01h and e0h, if the high bit is set (ie a keyup scan code), convert it to a word and store it in memory.

The routine looks really nasty, and it's almost certainly generated assembly (here's a third of it, by hand so it isn't perfect):
;~~SNIP~~
    mov bx, bp      ; 8e dd
    sti             ; fb
    xor ax, ax      ; 33 c0=xor Gv Ev ax ax

;5050h:  e4 60 3c e0 75 09 c7 06 60 3a 00 01 eb 24 90 a8
    in al, 60h
    cmp al, e0h     ; 3c=cmp al,[*]
    jnz +09h        ; 75=jnz [r*]
                    ; c7=mov Ev,Iv
    mov ax, [3a60h] ;  6=00|000|110 ax imm
    00
    01
    jmp +24h        ; eb=jmp [r*]
    cbw             ; 90=cbw
    test al, 80h    ; a8=test al,[*]


The bytes E460 are the only possible way to read in from port 60h, that's how I tracked this in the exe. E464 (in the case of PS/2) doesn't exist, nor does E661 or E664 - meaning nothing is ever written to the PPI, only read. Indeed, at 5091h there's
    cli
    mov al, 20h
    out 20h, al 
    sti 
which is the IRQ acknowledgement, followed by 8 POPs (haha, generated code).

While this is still confusing, its clear that the preferred way to read keyboard input for gaming (care of Epic Megagames, 1992) is to write your own interrupt and track the keyboard state on your own. 

So how do we do this? Well, C++'s DOS.H includes exactly what we need. The small program above is the minimum to hook into the keyboard, but we need to be careful with our new assembly routine. With a little inspiration from the Wolfenstein 3D source, we can now write what we need. This goes in the asm {} block:

    in al, 60h
    mov KEYSCAN, al    // Read the KB and store it in KEYSCAN

    in al, 61h
    or al, 80h
    out 61h, al        // flip the top bit of port 61h
    xor al, 80h        //  on and off. This is the keyboard
    out 61h, al        //  "acknowledge" signal

    mov al, 20h
    out 20h, al        // Write 20h to port 20h (IRQ acknowledge)

(asm { } is all that's needed for inline assembly. It just works.) 

Now, in main(), we need to re-orient the BIOS's read key function to our new one. These two lines at the beginning will do what we need:

    oldkb = getvect(0x09);
    setvect(0x09, newKb);

And that's it! 09h points to the address of the DOS BIOS routine we want to overwrite, so we can save that vector for later if we need it with getvect(). setvect() will change that same vector to the interrupt-type method we pass to it, which we've named newKb.

That's it!!! Perfect frame-independant keyboard input obtained.

Now all you have to do is check whether the KEYSCAN is a PRESS event (0-7f) or RELEASE (80-ff) and store it in your input methods. 

Full .cpp file here.

Other sources:
MS-DOS kb scan codes
8086 Opcode chart
MS-DOS EXE file format
8086 MOD-R/M byte information / [2] / byte prefix info