Wednesday, February 3, 2021

[PC88] C framework for the NEC PC8801 - Part 2, Simple graphics

The following project (which is very similar to what is found in the repository) demonstrates the basics of drawing on the PC-88. 

#include "pc88-c.h"
#include "img_b.h"
#include "img_r.h"
#include "img_g.h"

PlanarBitmap layeredImage = { 
    img_r, img_g, img_b, 248/8, 100
};

void main()

{   
  
  IRQ_OFF

    // Test yellow ALU draw (V2)
    ExpandedGVRAM_On();     
    EnableALU();       
    SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW);

    vu8* vp = (vu8*)0xc100;
    *vp = 0xff;

    ExpandedGVRAM_Off();

    // Toggle, then test blue ALU draw (v2)
    ExpandedGVRAM_On();
    SetIOReg(EXPANDED_ALU_CTRL, CLR_BLUE);
    vp += 0x100;
    *vp = 0xff;
    
    // GVRAM copy mode on
    SetIOReg(EXPANDED_GVRAM_CTRL, (u8)(bit(7)|bit(4)) );
    __asm
      ld a,(0xc200)
      ld (0xc201),a
    __endasm;
    // (copies blue byte)
    ExpandedGVRAM_Off();

    // Planar bitmap (V1) draw and individual pixels
    DisableALU(); // ALU off
    PlanarBitmap* pb = &layeredImage;    
    DrawPlanarBitmap(pb, 20, 10);
    SetPixel(360, 180, CLR_BLUE);
    SetPixel(361, 181, CLR_CYAN);
    SetPixel(362, 182, CLR_GREEN);
    SETBANK_MAINRAM() // must reset after draw!

    IRQ_ON

    while(1)

    {
        Wait_VBLANK();
        if(GetKeyDown(KB_RETURN)) print("HI!\x00");
    }
}

(Updated 2/5/2021)

At the top, img_b, img_g, and img_r are the resultant const char arrays produced from png288.py (using ishino.png). 

These are referenced in the definition of layeredImage, along with its width (divided by 8, for byte width) and height in pixels. This is drawn in V1 mode, described below.

LINE_POINTER and SCREEN_POINTER are initialized to SCREEN_TXT_BASE and 0, respectively. This is done in __init(), within pc88-c.h. 

When drawing, we want interrupts disabled, so we call IRQ_OFF 

First, we'll try drawing a line of 8 pixels (in VRAM, this equals 0xff) in V2 mode, using the ALU.

!!! IMPORTANT !!!
Before calling EnableALU(), you must first call ExpandedGVRAM_On(). If you enable ALU expanded mode without swapping to expanded GVRAM, the system will think you want to write to Main RAM instead - because expanded mode is off, it defaults to independent mode. 

Once that's done, you load your pen color into EXPANDED_ALU_CTRL*. Then any write to 0xC000+ will write that color to the screen, in a linear order of 1bpp pixels - for instance, 0xFF is a string of 8 pixels, 0b10101010 is every other pixel, etc. 0xC000 is the top left single pixel of the screen. (Remember that PC-88 pixels are double-high!)
*See the "deeper explanation" below

When done writing to GVRAM, you need to disable it with ExpandedGVRAM_Off() or SETBANK_MAINRAM(). Expanded mode (aka ALU) writes require the former, independent mode writes require the latter. You won't be able to access the Main RAM page, and overall program performance will decrease, until Main RAM is banked back in.

The "simple" explanation of Expanded mode/ALU writing:

1. Disable IRQ / wait for VBlank
2. Enable Expanded GVRAM access mode (set the high bit of I/O register 0x35)
3. Enable the ALU (set bit 6 of I/O register 0x32, leaving the rest unaffected)
4. Set I/O register 0x34 with the palette number to write
5. Write pixel data to VRAM (0xC000~).
6. Disable Expanded GVRAM access mode
7. Re-enable IRQ

In PC88-C this looks like:

IRQ_OFF
ExpandedGVRAM_On();
EnableALU();
SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW);
vu8* vp = (vu8*)0xc100;   // initialize vram pointer somewhere
*vp = 0b10101010;         // write every other pixel
ExpandedGVRAM_Off();
IRQ_ON


EXPANDED_ALU_CTRL has a deeper purpose than simply setting the active palette, however. The method used when writing to GVRAM through the ALU depends on the value in I/O register 0x35 (so named EXPANDED_GVRAM_CTRL). 

The actual bit definitions are:
EXPANDED_ALU_CTRL (34h)
bit   7    6    5    4    3    2    1    0
          GV2H GV1H GV0H     GV2L  GV1L GV0L

Where GV0, GV1 and GV2 represent each respective graphic VRAM plane, and H and L represent the high and low bits of the following functions for each plane:
 H L
  0 0 - Bit reset 
  0 1 - Bit set (OR)
  1 0 - Bit invert (XOR)
  1 1 - Ignore / noop

Take a moment to understand what this means in practice. If all three of the lower bits are set, that means any bit written to VRAM through the ALU will set the bits on all 3 graphics planes. A white pixel will be written. Hence, loading EXPANDED_ALU_CTRL's lower bits with the palette value has the same effect as changing the active pen color. 

Loading only the upper bits of the register with a palette value will flip the bits on that plane. SetIOReg(EXPANDED_ALU_CTRL, CLR_YELLOW << 4), for instance, will change black pixels to yellow, and yellow pixels to black (values 000 ^ 110). To determine what other colors would change to, you would have to XOR the present color on that plane. This has limited application - one, to erase data you know is already there, and two, to perform quick palette swaps. 

Loading both the upper and lower bits prevents the ALU from writing to that plane. If writing to GVRAM is not working when ALU is enabled, ensure I/O register 0x34 is set to the proper value.

EXPANDED_GVRAM_CTRL (35h)
bit   7    6   5-4   3   2   1   0
     GAM   -   GDM   -   GC  RC  BC

GAM : 0 - enable Main RAM, 1 - enable GVRAM
GDM: Graphic Data Multiplexer control
   0 0 - ALU write via port 34h
   0 1 - Write all planes via VRAM copy
   1 0 - Transfer plane R to plane B*
   1 1 - Transfer plane B to plane R*
GC, RC, BC - Bit comp data for each of the 3 GVRAM planes**

*Used in hi-res B&W mode
**Used in multiplexer modes 00 and 01


Generally, this register will be at 0x80 when writing to GVRAM and 0 during normal program execution. When bit 7 is set, data that is loaded into the accumulator from GVRAM is not actual data, but instead as I understand it, a VRAM buffer. What is written is also not actual data, but it instead moves that buffer to the desired location. (On this point it is difficult to find a detailed explanation).

The bottom 3 bits of this register act as a mask when loading ALU data from VRAM. Mind that the value loaded into the accumulator is _not_ what is written to VRAM, regardless of the mask. This is for arithmetic operations only, i.e. determining the color that needs to be written to the ALU to change the pixel to a specific color. (The mechanism to do this at the moment is beyond me)

V1 Mode - Pixel and PlanarBitmap draw:
You can mix V1 and V2/ALU draw modes, as shown in the main.c example. To write V1 mode graphics (independent GVRAM plane access):

    DisableALU();
Turn off the ALU, if it is on.

    PlanarBitmap* pb = &layeredImage;    
    DrawPlanarBitmap(pb, 20, 10);
If you are drawing a bitmap, initialize a pointer to the defined struct. 
This is a macro that (over)writes the bitmap on all 3 VRAM planes.

    SetPixel(360, 180, CLR_BLUE);
Paints an individual pixel.

    SETBANK_MAINRAM()
DrawPlanarBitmap() and SetPixel() both change the active page of memory bank 0xC000~0xFFFF to the corresponding RGB plane. After calling them, you must call SETBANK_MAINRAM() before returning to normal program execution.

And that's all there is to it! V1 mode is simple, but it is ineffective and slow - V2 is more complex, depending on your needs, but can cut down render time by over half.

The code above in the example paints a bitmap in V1 mode and several methods of plotting pixels in V2 mode. 

Part 3 will detail V1 mode's layered palette technique, software sprites, and more.


[PC-88] C framework for the NEC PC8801 - Beginnings

So, you wanna make a game for an old Japanese computer virtually nobody in the west cares about? 

Welcome!

I built this from my own knowledge of Python, C, and z80 assembly. (Thanks to the SDCC core!) I had to start from scratch, but a very large portion of the knowledge I gained came from these two Japanese-only sites on PC88 assembly:

Bookworm's Library

Maroon Youkan

I will attempt to translate and explain what I've learned as best I can. If you have questions and are serious about PC-88 C development, consider joining the @RetroDevDiscord and helping the few of us there are. 

I use the Windows 10/x64 distribution of SDCC, version 4.0.0. Details on the build process further down.

PC88-C on Github

PC88-C base file list:

src\main.c
src\pc88-c.h
tools\hex2bin.py
tools\hexer.py
tools\maked88.py
tools\png288.py
ipl.bin
*makepc88.bat

(*makepc88.bat is the primary build script.)

MAIN.C
As with most C projects, main.c contains the project code. SDCC generally only likes one primary C file at a time, so try to put all your code in main.c or in files included by main.c.

Due to my unfamiliarity with the inner workings of SDCC, void main() must be the first actual code entry in the built file. The crt0-equivalent, IPL.BIN, points directly to code start at $1000, which is where the autoloader targets (further information below).

PC88-C.H
Most of the command documentation, explanation on registers, etc. is in this file. There is still a lot that needs to be documented, but a lot of information is here. Overview:


These are fairly self-explanatory. String should allow you to define character arrays as you are accustomed. bit() allows you to get bit values without doing the hex in your head.



PlanarBitmap is a bitmap definition required for DrawPlanarBitmap(). 'r', 'g' and 'b' are pointers to raw image data for each of the color planes, and 'w' and 'h' are width (in tiles, or in pixels divided by 8) and and height (in pixels). 

The PC-88's video memory is divided into three, one-bit RGB color planes. The original PC-88 models, referred to as 'V1' mode, had to access the RGB planes independently, one at a time. The resulting 3-bit color is displayed on the screen. By default, plane 0 is blue, plane 1 is red, and plane 2 is green. So, to get a "cyan" color, you write a 1 to the blue and green planes, and a 0 to the red plane.

You can change the default palette values for certain effects, such as simulated foreground- and background-layers by limiting overall palette color. This technique is explained in part 3, and functions well with 1-bit software sprites.

However, given the 4MHz CPU speed of the V1 models and the relatively slow bus speed, this is not effective for high fidelity games. V2 mode came not much later, and the ALU expansion module.

The ALU (Arithmetic Logic Unit, thanks rikkles) is the key to improving game performance. With the control of three extra registers, you can write to all 3 VRAM planes simultaneously. The explanation given of utilizing the ALU was extremely complex and took me several days to fully understand, so don't feel bad if it doesn't click right away.

Before getting into that, here are the function outlines:

static inline void putchr(u8 c) (and putchr40)
Puts a character on the screen at location global SCREEN_POINTER. SCREEN_POINTER must be initialized in main() at SCREEN_TXT_OFFSET (or your desired location) before using this function or print().

void print(String str) (and print40)
Prints a string to SCREEN_POINTER

u8 ReadIOReg(u8 r)
Returns the value in a given I/O register, if that register can be read. See the H file for detailed definitions.

void SetIOReg(u8 r, u8 v)
Sets the value of I/O register 'r' to single byte value 'v'.

void SetTextAttribute(u8 x, u8 y, u8 attr)
Adds an attribute byte-pair to row 'y', beginning at position 'x'. The text attribute macros are explained below.

void ClearAttributeRam()
Resets the entire screen's text attributes to their default (80/Color mode) defaults.

void SetCursorPos(u8 x, u8 y) (and SetCursorPos40)
Moves SCREEN_POINTER to 'x', 'y' on-screen, where x=(0, 79) and y=(0, 25)

void Wait_VBLANK()
Proper Vblank ASM routine. Waits for VBL signal from the CRTC, then waits until its clear before returning to ensure we are *inside* vertical blank. 

void DrawPlaneBMP(const u8* img, u8 plane, u16 x, u16 y, u8 w, u8 h)
Draws raw, single color plane image data 'img', to color plane 'plane', of width/8 and height 'w' and 'h', at pixel offset 'x', 'y'. Warning, not pixel perfect - x offset is tied to 8-pixel boundary. This is macroed three times with DrawPlanarBMP, defined below. This will toggle the GVRAM planes for you, but SETBANK_MAINRAM() must be called afterwards to re-enable the Main RAM page.

void SetPixel(u16 x, u8 y, u8 c)
Pixel-perfect plot a single pixel at 'x', 'y' of color 'c', where bits 0-2 of 'c' represent the VRAM color planes. Default colors are macroed with the prefix CLR_.
SetPixel() will toggle GVRAM dependant on the color, but SETBANK_MAINRAM() must be called afterwards to re-enable the Main RAM page. Note that to access individual GVRAM pages, expanded GVRAM must be off.  

bool GetKeyDown(u8 SCANCODE)
Returns true if the macro 'SCANCODE', prefixed by KB_, is presently down; else returns false.

static inline void EnableALU()
static inline void DisableALU()
Sets I/O register 0x32 to 0xC9 if enabled and 0x89 (IPL defaults) if disabled. Must be called after calling ExpandedGVRAM_On(). (V2 only)

static inline void ExpandedGVRAM_On()
static inline void ExpandedGVRAM_Off()

Sets I/O register 0x35 to 0x80 if enabled and 0 if disabled. This is required before calling EnableALU(), otherwise writing through V2 mode (via the ALU) will not work. V2 only. Note that on boot, Expanded GVRAM is off.

void DiskLoad(u8* dest, u8 srcTrack, u8 srcSector, u8 numSecs, u8 drive) __naked 
Same assembly routine as is in IPL.BIN. e.g.
DiskLoad((u8*)0x4000, 1, 7, 40, 0);
Loads 40*256 bytes from track 1, sector 7, to RAM at 0x4000. 

void __init()
Sets up the screen pointer and calls main(). Should generally be left alone. :)

Macro definitions:

#define SetBGColor(c) SetIOReg(0x52, c << 4);
#define SetBorderColor(c) SetIOReg(0x52, c); // PC88mk2 and prior

Sets the background color for color text mode. Border color function was removed from most later models.

#define SETBANK_BLUE() SetIOReg(0x5c, 0xff);
#define SETBANK_RED() SetIOReg(0x5d, 0xff);
#define SETBANK_GREEN() SetIOReg(0x5e, 0xff);
#define SETBANK_MAINRAM() SetIOReg(0x5f, 0xff);
Toggles GVRAM banks over 0xC000 ~ 0xFFFF. SETBANK_MAINRAM() must be active during normal program execution - having any VRAM bank active can slow programs down.

#define DrawPlanarBitmap(pb, x, y) 
Macro for drawing a PlanarBitmap struct (V1 mode).

#define COLORMODE_SET(color, semigraphic) 
Defines a SET type attribute in Color Text mode, where 'color' is 0-7 and 'semigraphic' is 0 or 1.

#define COLORMODE_ATTR(underline, upperline, reverse, blink, hidden) 
Defines an ATTR type attribute in Color Text mode, where all parameters are either 0 or 1.

#define BWMODE_ATTR(underline, upperline, reverse, blink, hidden) 
#define ATTR_BW_SEMIGRAPHIC 0b10011000
Defines an attribute for B&W Text mode. For B&W mode, use the ATTR_BW_SEMIGRAPHIC macro to enable semigraphic mode.

#define IRQ_OFF __asm di __endasm;
#define IRQ_ON __asm ei __endasm;
#define HALT __asm halt __endasm;
#define BREAKPOINT HALT 
Convenience macros. When swapping RAM/GVRAM pages and reading/writing IO ports, remember to disable IRQs. HALT and BREAKPOINT are simply easy ways to aid in debugging without a full debugger.

HEX2BIN.PY
Takes the place of hex2bin.exe. This is taken from Intel's official open source library. Requires Python 3 and python module "intelhex", obtainable via 'pip install intelhex'. This is already integrated in makepc88.bat.

HEXER.PY
Simple tool of convenience - overwrites 1 byte in the given file with the given value, e.g.
 python3 tools/hexer.py ipl.bin 0x2f 0x50
Will change the number of sectors loaded by the autoloader in IPL.BIN (the byte located at 0x2f) to 0x50. This tool is not utilized in the chain, but is there for ease of use.

MAKED88.PY
Replaces the eponymous D88SAVER.EXE. D88SAVER was taken from the above websites and was useful in generating a blank d88 file, and injecting files into it (including IPL.BIN). This serves the same purpose, and is used in the same way - with the added feature that it will create the disk file passed as argument if it does not already exist.
(Note this only supports 2D, or 375kB disks for now.)
This is integrated into makepc88.bat. 

PNG288.PY
Converts a standard indexed PNG to its corresponding R, G and B bitplanes, then writes them to C-style header files in const char array format. The header file byte data can then be drawn directly by DrawPlaneBMP(). Requires the modules 'Pillow' and 'numpy'.

IPL.BIN
This file is assembled using ASW assembler from ipl.z80 and disk.z80. It contains a short routine that sets up the screen and stack pointer, and has a minimal disk access routine for loading from floppy. 

Boot process on the PC-88 is roughly:
- Is the 'boot from floppy' dipswitch on?
-- If YES, copy the 256 bytes from cylinder/head/record 0/0/1 from the inserted disk into RAM at 0xC000. Then, jp $C000.
--If NO, check the TERMINAL and BASIC dipswitches and boot to ROM.

The 256 bytes within IPL.BIN are therefore called from within RAM at org $C000. This is clearly not enough to run an entire game, so the routine then copies N bytes, the value of which is located at offset [0x2F] in the IPL, to the location at offset [0x38-0x39].
By default, [0x2F] = 0x4F and [0x38-0x39] = 0x00 0x10, meaning 79 sectors (2f = 79 * 256 bytes, or ~20kB) is copied from disk to RAM starting at little-endian address $1000. Feel free to change these using hexer.py if you don't have ASW to re-assemble.

MAKEPC88.BAT
Performs the following:
- Deletes app.d88
- Creates a blank 2D d88 with maked88.py
- sdcc -mz80 --code-loc 0x1000 --stack-loc 0x0080 --data-loc 0x0100 --fomit-frame-pointer --no-std-crt0 src/main.c
- Converts the resulting IHX to BIN format using hex2bin.py
- Inserts IPL.BIN and the resultant MAIN.BIN into app.d88 at sectors 0 0 1 and 0 0 2
- Prints a (rough) outline of the memory map
- If "main.bin" (or the resultant filename) is passed as an argument, it will inform you if the file is larger than the default number of sectors copied in by the autoloader. (change line 6 in the bat, set usedsec=nn if you change this value in the IPL). 
- Launches the emulator (you must change this to your own emulator path)

This finalizes the explanation of the files included in the repository.

Part 2 will cover basic text, pixel and bitmap drawing in both V1 and V2 modes!

Friday, December 18, 2020

Loading RLE-compressed CSV tile maps into RAM on Genesis: SGDK, Tiled, and Python

 If you haven't used Tiled, I strongly recommend it. It's a platform agnostic tile based mapping tool with a number of nice features. A particularly nice one is CSV export - a format that can be very quickly converted into byte code or something else.

I won't cover how to make or organize a tile map here - that is something covered in detail in a number of places. Skipping ahead, let's assume you've made a simple map something like this:



In our case, we want it as an .h file we can include in our SGDK project. After exporting the map as a CSV, we get the following data, or something like it:


The entirety of the python script I wrote to convert the CSV to a .h file is here (explanation below):
# csv2c.py
import sys
if(len(sys.argv) != 2): # 1 arg only
    print('error')
    sys.exit()
f = open(sys.argv[1], 'r')
csvtxt = f.read()
f.close()
lines = csvtxt.split('\n')
i = 0
csvtxt = '' # split at line break bytes
while i < len(lines)-1:
    csvtxt = csvtxt + lines[i] + ',\n' 
    i += 1
i = 0
chars = [] # make output array
while i < len(lines):
    b = lines[i].split(',')
    j = 0
    while j < len(b):
        if (b[j]) != '':
            chars.append(b[j])
        j += 1
    i += 1
wid = int(len(chars)**0.5)
i = 0
outchars=[] 
outchars.append(wid) # compress w RLE
while i < len(chars):
    if (i < len(chars)-2):
        if (chars[i] == chars[i+1]) and (chars[i] == chars[i+2]):
            mc = chars[i]
            outchars.append('254') #0xfe
            count = 0
            while (mc == chars[i]) and (i < len(chars)-2):
                count += 1
                i += 1
            i = i - 1
            outchars.append(str(mc)) #char
            outchars.append(str(count)) #times
        else:
            outchars.append(chars[i])
    else:
        outchars.append(chars[i])
    i += 1
outchars.append(len(outchars)-1) # end compression scheme
convt = '\t'
fn = sys.argv[1].split('/')
i = 0
while i < len(outchars): # tabulate str
    convt = convt + str(outchars[i]) + ','
    if ((i+1)%16==0):
        convt = convt + '\n\t'
    i += 1
outstr = "#ifndef HEADERSET\n#include <genesis.h>\n#endif\n//map[0] = mapWidth, map[..n] = size of array -2\nconst u8 " + fn[len(fn)-1][:-4] + "["+str(len(outchars))+"] = {\n" + convt + "\n};"
f = open(sys.argv[1][:-4]+'.h', 'w')
f.write(outstr)
f.close()
print(str(len(chars)) + ' written')

The string manipulation is fairly standard, so I'll explain the compression scheme:

wid = int(len(chars)**0.5)
i = 0
outchars=[] 
outchars.append(wid) # compress w RLE

This first bit takes the square root of the number of tiles, 256 for a 16 by 16 map, and adds that number (i.e. sqrt(256) = 16) as the first digit in the array. This is optional, it's so my game can know how big the map is for other code.

while i < len(chars):
    if (i < len(chars)-2):
        if (chars[i] == chars[i+1]) and (chars[i] == chars[i+2]):

This iteration looks confusing. My RLE compression looks like this: 
1. If the byte is NOT 0xFE, copy through
2. If the byte IS 0xFE:
    - The next byte is the value to copy 
    - The following byte is how many times to copy it
    - Move to next byte, back to 1.

That means the following are true:
"1 1 1 1 2" = 0xFE 0x01 0x04 0x02
"2 2 2 1 1" = 0xFE 0x02 0x03 0x01 0x01

So when compressing our raw map, we need to look ahead a minimum of two bytes to see if a 0xFE scheme is needed.

            mc = chars[i]
            outchars.append('254') #0xfe
            count = 0
            while (mc == chars[i]) and (i < len(chars)-2):
                count += 1
                i += 1
            i = i - 1

Next we search ahead, increasing our 'i' iterator as we do, and count the number of times we see the same number. The tricky part is the 'i = i - 1' at the end due to the post-increment. Python will still fall through to the final 'i = i + 1', and if we don't decrement i by one, we'll accidentally skip the next byte in the map. 

            outchars.append(str(mc)) #char
            outchars.append(str(count)) #times
        else:
            outchars.append(chars[i])
    else:
        outchars.append(chars[i])
    i += 1
outchars.append(len(outchars)-1) # end compression scheme

We need else's in both nests of the for loop to ensure all characters are written. It looks awkward but it is correct - though there might be a prettier way of doing it! 
Finally, we append the number of COMPRESSED characters to the end of the array - this is so my game has the length of the decompression routine ready to go. 

The script outputs a .h file that looks like this:

HEADERSET is simply what I call my project's shared header set. Without <genesis.h> it will complain about u8, but it's not a big deal. 

Note that the array is a const. SGDK/GCC should put this in ROM without prodding. Finally, the C:

u8 map[256];
u8 mapWidth;

void LoadDungeonMap(const u8* dungeon){
    u16 mapctr = 0;
    mapWidth = *(dungeon + 0);
    u16 dlen = *(dungeon + sizeof(dungeon));
    for(u16 i = 1; i < dlen; i++)
    {
        u8 nb = *(dungeon + i);
        if(nb == 0xfe){
            u8 ct = *(dungeon + i + 2);
            nb = *(dungeon + i + 1);
            for(u8 j = 0; j < ct; j++){
                map[mapctr++] = nb;
            }
            i = i + 2;
        }
        else {
            map[mapctr++] = nb;   
        }
    }
}

To call it:

int main(u16 hard){
    (...)
    LoadDungeonMap(&maptest2);
}

Note that when calling, we use & to pass the address of the first entry of the const array. When defining the function, we use * to indicate that the variable we pass is a pointer, which contains an address

Then to access the variable contained within the pointer's address, we use the * operator on the address of dungeon, offset by + i to get the byte value. Then we perform our RLE decompression, writing bytes to map[] (declared global to not be optimized away).

There you have it. No memcpy or extra allocation, just nice, straightforward RLE decompression for loading maps!

Friday, August 14, 2020

Super fast Raycast 3D engine in LOVE 2D

 (Update: I mistakenly thought during my experimentation that clearing the buffer is necessary each frame. It's not. The article has been updated below)

LOVE is great. Underneath the hood, it's just a Lua interface to render OpenGL textures to a window - which is simple and perfect for any 2D sort of game. 

Well, almost. Older game hardware let you access VRAM (and thus individual pixels) directly and more quickly than modern hardware. We have shaders, but what this allows us to do is GPU-side manipulation on graphics data we've already pushed to the video buffer - it doesn't let us construct data from a tile map, for example. The amount of memory available to the pixel shader is also fairly limited, so you can't use it to extrapolate entire images, for instance.

This means when making retro-style games in LOVE, which a lot of times requires manipulating an image as if it were VRAM or pixel data, can actually be unintuitive, because you're not just drawing layers of static images on top of each other. Every time you change the image, you have to recreate it before you redraw it. LOVE warns against this explicitly, because you can very quickly overflow and crash.

A perfect example of pixel-based rendering is a 'raycasting' engine, like Wolfenstein 3D, or other simulated 3D games like the original Elite which plotted pixels directly to video to draw wireframe models. These games were developed before 3D accelerator GPUs were a thing, so all rendering is done CPU side. 

Writing directly to memory is super fast, so Wolfenstein 3D could run pretty well on a 386. Unfortunately, in frameworks like LOVE which are built on top of Lua, on top of C, it's not so easy to write directly to memory. Lua tables are extrapolated, inferred, blown apart etc., and the performance hit can be obvious when you're polling several values from tables of tables a couple thousand times per frame, sixty times per second!

There is an absolutely amazing raycasting engine tutorial by lodev for C++ available on his website. If you know C++, you can likely recreate Wolfenstein 3D from scratch in just a couple days! I personally have never written a raycasting engine, and wanted to see if it was possible in LOVE. I went through it, and guess what? It works really well! Unfortunately, I hated the performance (60% or higher CPU usage), thanks to using so many tables.

To draw the image, LOVE has ImageData objects, where you can use the :setPixel() method, but this is slow. About as slow as using Lua tables, in fact, so we want to avoid using this. What we can do instead is treat the ImageData as a vram buffer - we 'draw' everything internally, then push it all at once using an appropriate data structure. In this case, we can use the :replacePixels() method once per frame instead - but we still suffer from the data format limitation.

So how do we work around that? Turns out it's more simple than you might think.

LuaJIT, the ultra speedy single-pass, "just-in-time" compiler for Lua that LOVE uses, offers a library called 'ffi'. The ffi library allows you to extend C. I won't talk about how great this is, but instead I offer this tiny piece of code that fixed all my problems:

local ffi = require 'ffi'
ffi.cdef[[
typedef struct { uint8_t r, g, b, a; } pixel;
]]

This defines for us a new usable C struct named pixel that has four components,
r, g, b, a which are each 8-bit integers. What may not be obvious is that once we initialize a variable
of type 'pixel' then it will be a byte-perfect representation (i.e. 4 sequential bytes) within our Lua program.

So this is how it's used:

screenBuffer = ffi.new("pixel[?]", screenWidth*screenHeight)
bufSize = ffi.sizeof(screenBuffer)
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
drawBuffer = love.graphics.newImage(drawData)
a
This code initializes a struct array of type "pixel" of ? elements where ? is the number you pass as the second
argument. ffi.sizeof() grabs the size in bytes of the object. We need this for the next line, which creates a new
LOVE ImageData object in the correct format (rgba8, or an 8-bit series of red, green, blue, and alpha for each
element). ffi.string() will coerce the parameter passed, screenBuffer, to a char * of size bufSize. Finally, the
drawBuffer image (what is actually put on the screen) is initialized from this ImageData.

Don't actually do this:

Don't forget to clear the screenBuffer every frame:
for i=0,(screenWidth*screenHeight) do
screenBuffer[i].r = 0
screenBuffer[i].g = 0
screenBuffer[i].b = 0
screenBuffer[i].a = 0
end
You don't need to clear the screen buffer if you're tracing the entire ceiling and floor every frame. Omitting
this will cut out a lot of cycles!

Then, whenwriting pixels to the screen, you do this instead:

local c = textureData[texNum][(ty*textureSize)+tx]
local r, g, b = c.r, c.g, c.b
local px = math.floor((y*screenWidth)+x)
screenBuffer[px].r = r
screenBuffer[px].g = g
screenBuffer[px].b = b
screenBuffer[px].a = 255

Where 'c' is 'color' in the original source linked above, and you write to the screenBuffer indexed linearly
instead of a two-dimensional array. Setting the color and alpha to 0 at the beginning of each frame is the
equivalent of clearing out the graphics buffer.

When converting the math from C++, be veeeery careful of order of operations (the example source is
very lazy) and of data types. Also be aware that LuaJIT, as mentioned above, is one-pass: what this means
in this case is that inline expressions are evaluated when they are encountered, and not before. So, for
expressions that are repeated, factor them out of loops as much as possible.

The other big thing to watch out for is expressions inside of array indexes (these MUST be cast to integers
with math.floor() or the math won't work) and variables that are declared as integers. As long as these are
truncated with math.floor() or some similar operation, then you'll be fine. But if you don't, you'll see results
like this and pull your hair out trying to figure out why:


Once you've done your draw code, you gotta create a new ImageData from the screen buffer byte string. This
is where it gets a little tricky:
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
if drawData then
drawBuffer:replacePixels(drawData) end
drawData = nil

Remember drawBuffer is the LOVE type image. We only need drawBuffer at the end of our processing,
meaning drawData can be nil'ed out and garbage collected.

The purpose of doing this is unclear, so I'll try to explain. LuaJIT's garbage collection doesn't run super often,
and doesn't go super deep. This is to prevent impacting performance. Unfortunately, the use case of LOVE
where you're generating possibly upwards of 100mb of graphic data per frame WILL cause your game to
overflow and crash (at approx. 2GB).

Nil pointers should get cleaned up during garbage collection and free up that RAM, but we need it to run
faster than it is. Eventually this small app will take several hundreds of megabytes of memory, and we can
certainly trim that down. The solution is remarkably simple:

function love.update(dT)
secondCtr = secondCtr + dT
if secondCtr > 1.0 then
secondCtr = secondCtr - 1.0
collectgarbage()
end
...

collectgarbage() is a native Lua method that when called with no arguments will perform a full garbage
clean. If you call it as collectgarbage('count') you'll get the memory consumption in kilobytes of Lua (minus
the 50-60MB or so of RAM that LOVE takes up). If you like you can fine-tune this for minimal impact.

At the end of the day you should end up with a result like this that stays around 10-15% CPU and under
100 MB of RAM, depending on your graphics (I even added a sprite):



Tuesday, July 28, 2020

Non-VR (desktop 3D) game template for LOVR

main.lua:

camera = nil
cameraPosition = { x = 0.0, y = 0.0, z = 0.0 }
cameraTarget = { x = 0.0, y = 0.0, z = -1.0 }

function lovr.update()
    camera = lovr.math.newMat4():lookAt(
        vec3(cameraPosition.x, cameraPosition.y, cameraPosition.z), 
        vec3(cameraTarget.x, cameraTarget.y, cameraTarget.z))
    view = lovr.math.newMat4(camera):invert()
end

function lovr.mirror()
    lovr.graphics.clear()
    lovr.graphics.origin()
    lovr.graphics.transform(view)
    renderScene()
  end

function lovr.draw()
    -- Do nothing
end

renderScene = function()
    lovr.graphics.print('hello world', 0, 0, -3)
end

conf.lua:

function lovr.conf(t)
    t.identity = 'LOVR Non-VR Boilerplate'
    t.modules.headset = false
end

Thursday, July 23, 2020

Simple lighting for LÖVR (Phong model in GLSL)

[ Complete source linked at end ]

LÖVR is amazing. You should be using it. (This assumes basic knowledge of it, Lua, or at least Love2D/Pico-8 and related project structure).

However, lighting is tricky for the uninitiated. There are no lighting prefabs or constructors -- you must do it all by hand. Luckily, it's not that hard! I shall attempt to explain what I've learned in the last few days.

We've been spoiled by applications that create "lights" for us, so we think of them as objects that cast light within the rendering space. This is not how lighting is done for most video games - casting light itself in a realistic way is extremely GPU-consuming.

What many 3D games do is they will process the color of each pixel on the screen (called a 'fragment' in shader language) based on the angle, distance, and color of the rays of light hitting it, and what color the texture is (if any).

This is done in three phases, in a very common lighting model called the Phong model.

(This tutorial was adapted from the very well-written LearnOpenGL tutorial in C++, found here).

Assuming you already have a project set up and are loading and displaying a model, let's try initializing a custom shader first. To do that, we write a slightly modified OpenGL .vs (vertex shader) which we store as a multi-line string in Lua:

customVertex = [[
    vec4 position(mat4 projection, mat4 transform, vec4 vertex)
    {
        return projection * transform * vertex;
    }
]]

Note for now this is just the default LÖVR vertex shader as listed in the 0.13 documentation.

Now, we define a new shader with customVertex:

shader = lovr.graphics.newShader(customVertex, nil, {})

Note that for the newShader method, passing nil will use the default. Now, to enable the shader, we add to lovr.draw():

lovr.graphics.setShader(shader)

(You may have to setShader() to reset the shader at the end of draw() if you have any issues).
If you run this as-is, it should perform exactly as if you had the default shader. Let's do the same thing for the fragment shader:

customFragment = [[
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
    {
        return graphicsColor * lovrDiffuseColor * vertexColor * texture(image, uv);
    }
]]

Changing nil in the newShader line to customFragment should again run with no issues.

Now let's get to ambient lighting!

Phase One

Step one of the Phong model is ambient lighting. Light bounces around everywhere, especially in the daytime, and even rooms without lights can be well-lit. You will likely change your ambient level frequently during the game, so being familiar with its affect on your scene is important.

The default LÖVR shader is "unlit", which means effectively your ambient lighting is at 100% all the time - all angles of all polygons are always fully bright. This is fine for certain things, but for rendering a 3d model in a virtual space, shading is pretty important. For our purposes, we are implementing ambient lighting by "turning down" this unlit effect to about 20% - a good value for rooms in the daytime, but you may find 10% or 30% more to your liking.

Here's the new fragment shader:

customFragment = [[
    uniform vec4 ambience;

    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
    {
        //object color
        vec4 baseColor = graphicsColor * texture(image, uv);

        return baseColor * ambience;
    }
]]
shader:send('ambience', { 0.2, 0.2, 0.2, 1.0 })

We changed a bit here. First, we added a new 'uniform' variable to represent the ambient light color. Uniform is a keyword that allows us to expose values through the LÖVR / Lua interface, so we can change them freely. We do this with the shader's :send method. Assigning a value to the uniform variable in this way is 'safe' programming - if you try to assign a value to a uniform variable on Android within it's declaration in the shader, the game will crash and complain. I set this value to a dark grey. The values correspond to R, G, B, A - though for this case you generally want the alpha value to be 1.0, otherwise anything drawn with this shader will be rendered as transparent.

Second, we are changing a lot about the value being returned.

The original code has graphicsColor (the value of lovr.graphics.setColor()) being multiplied by lovrDiffuseColor - this is a value of { 1.0, 1.0, 1.0, 1.0 }, but for simplicities' sake, I figured let's just not use this value (it's stored in a hidden shader header) and use our own.

Second, we don't need the vertexColor. This is another value which defaults to 1 that is separate from our draw color, and the texture color, and our new ambience color.

This should be a wee bit faster than it was, one would hope, by omitting a few unneeded variables. If you run your game, everything should look -considerably darker- than before. This is good! Now we layer on the diffuse lighting!

Phase Two

A group of vertices is, of course, a polygon. A ray emitting perpendicular from this polygon is the 'normal'. Depending on the angle of the position of the light versus the normals of your in-game models, the polygons are applied a percentage of the light cast. This makes sense and can be easily proven in the real world - the side of a box facing a light is brighter than the sides, which are brighter than the side facing away, etc.

Diffuse lighting simulates some of the bounce effect that ambient lighting does, with added bias on polygons perpendicular to the light source. 

To do this properly, we need to get the position of and normal of the vertex from within the vertex shader -- this means taking a 3d vector that comes "out of" the polygon -- and passing it to the fragment (pixel or color) shader so we know how "bright" to render that spot on the screen. 

The math for all of this is much better explained and proofed elsewhere, including the LearnOpenGL link above, but rest assured it has been done and triple checked a million times by a million people. What we need to know is how to do it in LÖVR!

Luckily, LÖVR loves you, and makes this very easy. Here's the new vertex shader:

defaultVertex = [[
    out vec3 FragmentPos;
    out vec3 Normal;

    vec4 position(mat4 projection, mat4 transform, vec4 vertex) 
    { 
        Normal = lovrNormal * lovrNormalMatrix;
        FragmentPos = vec3(lovrModel * vertex);
        
        return projection * transform * vertex; 
    }
]]

out is a keyword that simply passes the variable along to the fragment shader when the vertex shader is done. Doing this allows us to use the fragment position in world space and the vertex's normal to calculate our lighting changes. 

[ Special note: Casting and converting vec3 and vec4 can be annoying. Luckily, GLSL makes this easy by allowing a special .xyz method on vec4 variables that will do this for us, e.g. we could have done: FragmentPos = (lovrModel * vertex).xyz instead and it would perform the same. ]

In LÖVR, lovrNormal is defined as the vertex's normal, if one exists. Easy - already calculated for us! The reason why we multiply it by lovrNormalMatrix is so that we can get the normals applied to the model's transform - i.e. the position and rotation of the model as well. 

FragmentPos is less self-explanatory, but what we need to know is that this represents the xyz component of the current vertex of the currently being rendered model (of type lovrModel). In other words, a single visible point on the model. 

Now the important part, using that data on our fragment shader:
defaultFragment = [[
    uniform vec4 ambience;
    
    uniform vec4 liteColor;
    uniform vec3 lightPos;

    in vec3 Normal;
    in vec3 FragmentPos;
    
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
    {    
        //diffuse
        vec3 norm = normalize(Normal);
        vec3 lightDir = normalize(lightPos - FragmentPos);
        float diff = max(dot(norm, lightDir), 0.0);
        vec4 diffuse = diff * liteColor;
                        
        vec4 baseColor = graphicsColor * texture(image, uv);            
        return baseColor * (ambience + diffuse);
    }
]]
shader:send('liteColor', {1.0, 1.0, 1.0, 1.0})
shader:send('lightPos', {2.0, 5.0, 0.0})

The math and reasoning for this is explained in the LearnOpenGL tutorial, so here's the important bits for LÖVR:

- liteColor is a new uniform vec4, of values RGBA, that represents the individual light's emissive color
- lightPos is the position in world space the individual light emits light from 
- in is used here to indicate the variables we want from the vertex shader
- normalize() is an OpenGL function to make operations like this easier
- we are now returning the baseColor of the fragment times ambience PLUS diffuse - be sure these are added, not multiplied together

If you compile and run now, you should notice a bright light illuminating your scene. Experiment with variables and using the 'send' method (shader:send('liteColor', <new color table>) or shader:send('lightPos', <new position>)) in your draw() loops.

Almost there!!

Phase Three

Specular lighting does the least changes to individual pixels, but amounts to the most detail. For this implementation, we will be using view space, i.e. x y z of 0, 0, 0, for ease of calculation. If you read the accompanying tutorial, you know that performing these calculations in world space is more realistic. I'm sure you can think of games that use view space calculations -- ones in which the specular light reflections sort of followed your eyes as you moved. Now you know why!

We don't need to make any changes to the vertex shader, so here's the final fragment shader:

defaultFragment = [[
    uniform vec4 ambience;

    uniform vec4 liteColor;
    uniform vec3 lightPos;
    in vec3 Normal;
    in vec3 FragmentPos;

    uniform vec3 viewPos;
    uniform float specularStrength;
    uniform int metallic;
        
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
    {    
        //diffuse
        vec3 norm = normalize(Normal);
        vec3 lightDir = normalize(lightPos - FragmentPos);
        float diff = max(dot(norm, lightDir), 0.0);
        vec4 diffuse = diff * liteColor;
            
        //specular
        vec3 viewDir = normalize(viewPos - FragmentPos);
        vec3 reflectDir = reflect(-lightDir, norm);
        float spec = pow(max(dot(viewDir, reflectDir), 0.0), metallic);
        vec4 specular = specularStrength * spec * liteColor;
            
        vec4 baseColor = graphicsColor * texture(image, uv);            
        return baseColor * (ambience + diffuse + specular);
    }
]]
shader:send('liteColor', {1.0, 1.0, 1.0, 1.0})
shader:send('lightPos', {2.0, 5.0, 0.0})
shader:send('ambience', {0.1, 0.1, 0.1, 1.0})
shader:send('specularStrength', 0.5)
shader:send('metallic', 32.0)
shader:send('viewPos', {0.0, 0.0, 0.0})

viewPos at (0, 0, 0) is fine for a static camera, but we're doing VR, after all! If you have a headset connected, feel free to add this in lovr.update:

function lovr.update(dT)
    if lovr.headset then 
        hx, hy, hz = lovr.headset.getPosition()
        shader:send('viewPos', { hx, hy, hz } )
    end
end

[ Special Note 2: The viewing position (not as much angle) is very important for the effectiveness of specular light. If you move the camera with the WASD keys in the desktop version of lovr (as in, you are running without a headset) then the lighting effect won't look very good. For testing without a headset, in this example, it's best to keep the camera in one position, and rotate it. ]

specularStrength is the 'harshness' of the light. This generally amounts to how sharp or bright the light's reflection can look.

metallic is the metallic exponent as shown in the LearnOpenGL tutorial. This value should probably range from 4-256, but 32 is fine for most things. 

The rest of the math hasn't changed - we're just adding the specular value to the final fragment color. 

And that's it! With any luck, you'll have a properly-lit model like so (lightPos at 2.0, 5.0, 0.0):


There's lots of playing around you can do - experiment with multiple lights, new shaders that are variants on the theme, and explore GLSL. 

[ Special Note 3: For factorization purposes, you can keep the vertex and fragment shader code in seperate files (default extension for them is .vs and .fs). You can use the lovr.filesystem.read() command to load them in as strings just like above. The advantage of this is using syntax highlighting or linting when coding your shaders i.e. in VS Code. ]

COMPLETE SOURCE HERE

This will work on your Quest or Go as well if you follow the instructions on the LÖVR website for deploying to Android. I added a moving, unlit sphere in the example to represent the light source to better visualize it.

Final Note: If you are having issues with some faces on your models not being lit properly, there are a few things to check on your model. 
-First, make sure it is built with a uniform scale. This can easily be done in Blender by selecting a properly scaled piece, then A to select the entire model, then Cmd+A (Apply) -> Scale. There is also the uniformScale shader flag, which gives a small speed boost - you should be developing everything in uniform scale in VR anyway!
-Second, all model faces need to be facing the correct way to generate their normal properly for lighting. If you notice some parts of your model are shading in the opposite direction, you can flip the face direction in Blender by selecting it all in edit mode, then Opt+N > Recalculate Normals or Flip Normals. 
These two tips should fix 90% of any issues! ]

Have fun with LÖVR!