Friday, December 18, 2020

Loading RLE-compressed CSV tile maps into RAM on Genesis: SGDK, Tiled, and Python

 If you haven't used Tiled, I strongly recommend it. It's a platform agnostic tile based mapping tool with a number of nice features. A particularly nice one is CSV export - a format that can be very quickly converted into byte code or something else.

I won't cover how to make or organize a tile map here - that is something covered in detail in a number of places. Skipping ahead, let's assume you've made a simple map something like this:

In our case, we want it as an .h file we can include in our SGDK project. After exporting the map as a CSV, we get the following data, or something like it:

The entirety of the python script I wrote to convert the CSV to a .h file is here (explanation below):
import sys
if(len(sys.argv) != 2): # 1 arg only
f = open(sys.argv[1], 'r')
csvtxt =
lines = csvtxt.split('\n')
i = 0
csvtxt = '' # split at line break bytes
while i < len(lines)-1:
    csvtxt = csvtxt + lines[i] + ',\n' 
    i += 1
i = 0
chars = [] # make output array
while i < len(lines):
    b = lines[i].split(',')
    j = 0
    while j < len(b):
        if (b[j]) != '':
        j += 1
    i += 1
wid = int(len(chars)**0.5)
i = 0
outchars.append(wid) # compress w RLE
while i < len(chars):
    if (i < len(chars)-2):
        if (chars[i] == chars[i+1]) and (chars[i] == chars[i+2]):
            mc = chars[i]
            outchars.append('254') #0xfe
            count = 0
            while (mc == chars[i]) and (i < len(chars)-2):
                count += 1
                i += 1
            i = i - 1
            outchars.append(str(mc)) #char
            outchars.append(str(count)) #times
    i += 1
outchars.append(len(outchars)-1) # end compression scheme
convt = '\t'
fn = sys.argv[1].split('/')
i = 0
while i < len(outchars): # tabulate str
    convt = convt + str(outchars[i]) + ','
    if ((i+1)%16==0):
        convt = convt + '\n\t'
    i += 1
outstr = "#ifndef HEADERSET\n#include <genesis.h>\n#endif\n//map[0] = mapWidth, map[..n] = size of array -2\nconst u8 " + fn[len(fn)-1][:-4] + "["+str(len(outchars))+"] = {\n" + convt + "\n};"
f = open(sys.argv[1][:-4]+'.h', 'w')
print(str(len(chars)) + ' written')

The string manipulation is fairly standard, so I'll explain the compression scheme:

wid = int(len(chars)**0.5)
i = 0
outchars.append(wid) # compress w RLE

This first bit takes the square root of the number of tiles, 256 for a 16 by 16 map, and adds that number (i.e. sqrt(256) = 16) as the first digit in the array. This is optional, it's so my game can know how big the map is for other code.

while i < len(chars):
    if (i < len(chars)-2):
        if (chars[i] == chars[i+1]) and (chars[i] == chars[i+2]):

This iteration looks confusing. My RLE compression looks like this: 
1. If the byte is NOT 0xFE, copy through
2. If the byte IS 0xFE:
    - The next byte is the value to copy 
    - The following byte is how many times to copy it
    - Move to next byte, back to 1.

That means the following are true:
"1 1 1 1 2" = 0xFE 0x01 0x04 0x02
"2 2 2 1 1" = 0xFE 0x02 0x03 0x01 0x01

So when compressing our raw map, we need to look ahead a minimum of two bytes to see if a 0xFE scheme is needed.

            mc = chars[i]
            outchars.append('254') #0xfe
            count = 0
            while (mc == chars[i]) and (i < len(chars)-2):
                count += 1
                i += 1
            i = i - 1

Next we search ahead, increasing our 'i' iterator as we do, and count the number of times we see the same number. The tricky part is the 'i = i - 1' at the end due to the post-increment. Python will still fall through to the final 'i = i + 1', and if we don't decrement i by one, we'll accidentally skip the next byte in the map. 

            outchars.append(str(mc)) #char
            outchars.append(str(count)) #times
    i += 1
outchars.append(len(outchars)-1) # end compression scheme

We need else's in both nests of the for loop to ensure all characters are written. It looks awkward but it is correct - though there might be a prettier way of doing it! 
Finally, we append the number of COMPRESSED characters to the end of the array - this is so my game has the length of the decompression routine ready to go. 

The script outputs a .h file that looks like this:

HEADERSET is simply what I call my project's shared header set. Without <genesis.h> it will complain about u8, but it's not a big deal. 

Note that the array is a const. SGDK/GCC should put this in ROM without prodding. Finally, the C:

u8 map[256];
u8 mapWidth;

void LoadDungeonMap(const u8* dungeon){
    u16 mapctr = 0;
    mapWidth = *(dungeon + 0);
    u16 dlen = *(dungeon + sizeof(dungeon));
    for(u16 i = 1; i < dlen; i++)
        u8 nb = *(dungeon + i);
        if(nb == 0xfe){
            u8 ct = *(dungeon + i + 2);
            nb = *(dungeon + i + 1);
            for(u8 j = 0; j < ct; j++){
                map[mapctr++] = nb;
            i = i + 2;
        else {
            map[mapctr++] = nb;   

To call it:

int main(u16 hard){

Note that when calling, we use & to pass the address of the first entry of the const array. When defining the function, we use * to indicate that the variable we pass is a pointer, which contains an address

Then to access the variable contained within the pointer's address, we use the * operator on the address of dungeon, offset by + i to get the byte value. Then we perform our RLE decompression, writing bytes to map[] (declared global to not be optimized away).

There you have it. No memcpy or extra allocation, just nice, straightforward RLE decompression for loading maps!

Friday, August 14, 2020

Super fast Raycast 3D engine in LOVE 2D

 (Update: I mistakenly thought during my experimentation that clearing the buffer is necessary each frame. It's not. The article has been updated below)

LOVE is great. Underneath the hood, it's just a Lua interface to render OpenGL textures to a window - which is simple and perfect for any 2D sort of game. 

Well, almost. Older game hardware let you access VRAM (and thus individual pixels) directly and more quickly than modern hardware. We have shaders, but what this allows us to do is GPU-side manipulation on graphics data we've already pushed to the video buffer - it doesn't let us construct data from a tile map, for example. The amount of memory available to the pixel shader is also fairly limited, so you can't use it to extrapolate entire images, for instance.

This means when making retro-style games in LOVE, which a lot of times requires manipulating an image as if it were VRAM or pixel data, can actually be unintuitive, because you're not just drawing layers of static images on top of each other. Every time you change the image, you have to recreate it before you redraw it. LOVE warns against this explicitly, because you can very quickly overflow and crash.

A perfect example of pixel-based rendering is a 'raycasting' engine, like Wolfenstein 3D, or other simulated 3D games like the original Elite which plotted pixels directly to video to draw wireframe models. These games were developed before 3D accelerator GPUs were a thing, so all rendering is done CPU side. 

Writing directly to memory is super fast, so Wolfenstein 3D could run pretty well on a 386. Unfortunately, in frameworks like LOVE which are built on top of Lua, on top of C, it's not so easy to write directly to memory. Lua tables are extrapolated, inferred, blown apart etc., and the performance hit can be obvious when you're polling several values from tables of tables a couple thousand times per frame, sixty times per second!

There is an absolutely amazing raycasting engine tutorial by lodev for C++ available on his website. If you know C++, you can likely recreate Wolfenstein 3D from scratch in just a couple days! I personally have never written a raycasting engine, and wanted to see if it was possible in LOVE. I went through it, and guess what? It works really well! Unfortunately, I hated the performance (60% or higher CPU usage), thanks to using so many tables.

To draw the image, LOVE has ImageData objects, where you can use the :setPixel() method, but this is slow. About as slow as using Lua tables, in fact, so we want to avoid using this. What we can do instead is treat the ImageData as a vram buffer - we 'draw' everything internally, then push it all at once using an appropriate data structure. In this case, we can use the :replacePixels() method once per frame instead - but we still suffer from the data format limitation.

So how do we work around that? Turns out it's more simple than you might think.

LuaJIT, the ultra speedy single-pass, "just-in-time" compiler for Lua that LOVE uses, offers a library called 'ffi'. The ffi library allows you to extend C. I won't talk about how great this is, but instead I offer this tiny piece of code that fixed all my problems:

local ffi = require 'ffi'
typedef struct { uint8_t r, g, b, a; } pixel;

This defines for us a new usable C struct named pixel that has four components,
r, g, b, a which are each 8-bit integers. What may not be obvious is that once we initialize a variable
of type 'pixel' then it will be a byte-perfect representation (i.e. 4 sequential bytes) within our Lua program.

So this is how it's used:

screenBuffer ="pixel[?]", screenWidth*screenHeight)
bufSize = ffi.sizeof(screenBuffer)
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
drawBuffer =
This code initializes a struct array of type "pixel" of ? elements where ? is the number you pass as the second
argument. ffi.sizeof() grabs the size in bytes of the object. We need this for the next line, which creates a new
LOVE ImageData object in the correct format (rgba8, or an 8-bit series of red, green, blue, and alpha for each
element). ffi.string() will coerce the parameter passed, screenBuffer, to a char * of size bufSize. Finally, the
drawBuffer image (what is actually put on the screen) is initialized from this ImageData.

Don't actually do this:

Don't forget to clear the screenBuffer every frame:
for i=0,(screenWidth*screenHeight) do
screenBuffer[i].r = 0
screenBuffer[i].g = 0
screenBuffer[i].b = 0
screenBuffer[i].a = 0
You don't need to clear the screen buffer if you're tracing the entire ceiling and floor every frame. Omitting
this will cut out a lot of cycles!

Then, whenwriting pixels to the screen, you do this instead:

local c = textureData[texNum][(ty*textureSize)+tx]
local r, g, b = c.r, c.g, c.b
local px = math.floor((y*screenWidth)+x)
screenBuffer[px].r = r
screenBuffer[px].g = g
screenBuffer[px].b = b
screenBuffer[px].a = 255

Where 'c' is 'color' in the original source linked above, and you write to the screenBuffer indexed linearly
instead of a two-dimensional array. Setting the color and alpha to 0 at the beginning of each frame is the
equivalent of clearing out the graphics buffer.

When converting the math from C++, be veeeery careful of order of operations (the example source is
very lazy) and of data types. Also be aware that LuaJIT, as mentioned above, is one-pass: what this means
in this case is that inline expressions are evaluated when they are encountered, and not before. So, for
expressions that are repeated, factor them out of loops as much as possible.

The other big thing to watch out for is expressions inside of array indexes (these MUST be cast to integers
with math.floor() or the math won't work) and variables that are declared as integers. As long as these are
truncated with math.floor() or some similar operation, then you'll be fine. But if you don't, you'll see results
like this and pull your hair out trying to figure out why:

Once you've done your draw code, you gotta create a new ImageData from the screen buffer byte string. This
is where it gets a little tricky:
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
if drawData then
drawBuffer:replacePixels(drawData) end
drawData = nil

Remember drawBuffer is the LOVE type image. We only need drawBuffer at the end of our processing,
meaning drawData can be nil'ed out and garbage collected.

The purpose of doing this is unclear, so I'll try to explain. LuaJIT's garbage collection doesn't run super often,
and doesn't go super deep. This is to prevent impacting performance. Unfortunately, the use case of LOVE
where you're generating possibly upwards of 100mb of graphic data per frame WILL cause your game to
overflow and crash (at approx. 2GB).

Nil pointers should get cleaned up during garbage collection and free up that RAM, but we need it to run
faster than it is. Eventually this small app will take several hundreds of megabytes of memory, and we can
certainly trim that down. The solution is remarkably simple:

function love.update(dT)
secondCtr = secondCtr + dT
if secondCtr > 1.0 then
secondCtr = secondCtr - 1.0

collectgarbage() is a native Lua method that when called with no arguments will perform a full garbage
clean. If you call it as collectgarbage('count') you'll get the memory consumption in kilobytes of Lua (minus
the 50-60MB or so of RAM that LOVE takes up). If you like you can fine-tune this for minimal impact.

At the end of the day you should end up with a result like this that stays around 10-15% CPU and under
100 MB of RAM, depending on your graphics (I even added a sprite):

Tuesday, July 28, 2020

Non-VR (desktop 3D) game template for LOVR


camera = nil
cameraPosition = { x = 0.0, y = 0.0, z = 0.0 }
cameraTarget = { x = 0.0, y = 0.0, z = -1.0 }

function lovr.update()
    camera = lovr.math.newMat4():lookAt(
        vec3(cameraPosition.x, cameraPosition.y, cameraPosition.z), 
        vec3(cameraTarget.x, cameraTarget.y, cameraTarget.z))
    view = lovr.math.newMat4(camera):invert()

function lovr.mirror()

function lovr.draw()
    -- Do nothing

renderScene = function()'hello world', 0, 0, -3)


function lovr.conf(t)
    t.identity = 'LOVR Non-VR Boilerplate'
    t.modules.headset = false

Thursday, July 23, 2020

Simple lighting for LÖVR (Phong model in GLSL)

[ Complete source linked at end ]

LÖVR is amazing. You should be using it. (This assumes basic knowledge of it, Lua, or at least Love2D/Pico-8 and related project structure).

However, lighting is tricky for the uninitiated. There are no lighting prefabs or constructors -- you must do it all by hand. Luckily, it's not that hard! I shall attempt to explain what I've learned in the last few days.

We've been spoiled by applications that create "lights" for us, so we think of them as objects that cast light within the rendering space. This is not how lighting is done for most video games - casting light itself in a realistic way is extremely GPU-consuming.

What many 3D games do is they will process the color of each pixel on the screen (called a 'fragment' in shader language) based on the angle, distance, and color of the rays of light hitting it, and what color the texture is (if any).

This is done in three phases, in a very common lighting model called the Phong model.

(This tutorial was adapted from the very well-written LearnOpenGL tutorial in C++, found here).

Assuming you already have a project set up and are loading and displaying a model, let's try initializing a custom shader first. To do that, we write a slightly modified OpenGL .vs (vertex shader) which we store as a multi-line string in Lua:

customVertex = [[
    vec4 position(mat4 projection, mat4 transform, vec4 vertex)
        return projection * transform * vertex;

Note for now this is just the default LÖVR vertex shader as listed in the 0.13 documentation.

Now, we define a new shader with customVertex:

shader =, nil, {})

Note that for the newShader method, passing nil will use the default. Now, to enable the shader, we add to lovr.draw():

(You may have to setShader() to reset the shader at the end of draw() if you have any issues).
If you run this as-is, it should perform exactly as if you had the default shader. Let's do the same thing for the fragment shader:

customFragment = [[
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
        return graphicsColor * lovrDiffuseColor * vertexColor * texture(image, uv);

Changing nil in the newShader line to customFragment should again run with no issues.

Now let's get to ambient lighting!

Phase One

Step one of the Phong model is ambient lighting. Light bounces around everywhere, especially in the daytime, and even rooms without lights can be well-lit. You will likely change your ambient level frequently during the game, so being familiar with its affect on your scene is important.

The default LÖVR shader is "unlit", which means effectively your ambient lighting is at 100% all the time - all angles of all polygons are always fully bright. This is fine for certain things, but for rendering a 3d model in a virtual space, shading is pretty important. For our purposes, we are implementing ambient lighting by "turning down" this unlit effect to about 20% - a good value for rooms in the daytime, but you may find 10% or 30% more to your liking.

Here's the new fragment shader:

customFragment = [[
    uniform vec4 ambience;

    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
        //object color
        vec4 baseColor = graphicsColor * texture(image, uv);

        return baseColor * ambience;
shader:send('ambience', { 0.2, 0.2, 0.2, 1.0 })

We changed a bit here. First, we added a new 'uniform' variable to represent the ambient light color. Uniform is a keyword that allows us to expose values through the LÖVR / Lua interface, so we can change them freely. We do this with the shader's :send method. Assigning a value to the uniform variable in this way is 'safe' programming - if you try to assign a value to a uniform variable on Android within it's declaration in the shader, the game will crash and complain. I set this value to a dark grey. The values correspond to R, G, B, A - though for this case you generally want the alpha value to be 1.0, otherwise anything drawn with this shader will be rendered as transparent.

Second, we are changing a lot about the value being returned.

The original code has graphicsColor (the value of being multiplied by lovrDiffuseColor - this is a value of { 1.0, 1.0, 1.0, 1.0 }, but for simplicities' sake, I figured let's just not use this value (it's stored in a hidden shader header) and use our own.

Second, we don't need the vertexColor. This is another value which defaults to 1 that is separate from our draw color, and the texture color, and our new ambience color.

This should be a wee bit faster than it was, one would hope, by omitting a few unneeded variables. If you run your game, everything should look -considerably darker- than before. This is good! Now we layer on the diffuse lighting!

Phase Two

A group of vertices is, of course, a polygon. A ray emitting perpendicular from this polygon is the 'normal'. Depending on the angle of the position of the light versus the normals of your in-game models, the polygons are applied a percentage of the light cast. This makes sense and can be easily proven in the real world - the side of a box facing a light is brighter than the sides, which are brighter than the side facing away, etc.

Diffuse lighting simulates some of the bounce effect that ambient lighting does, with added bias on polygons perpendicular to the light source. 

To do this properly, we need to get the position of and normal of the vertex from within the vertex shader -- this means taking a 3d vector that comes "out of" the polygon -- and passing it to the fragment (pixel or color) shader so we know how "bright" to render that spot on the screen. 

The math for all of this is much better explained and proofed elsewhere, including the LearnOpenGL link above, but rest assured it has been done and triple checked a million times by a million people. What we need to know is how to do it in LÖVR!

Luckily, LÖVR loves you, and makes this very easy. Here's the new vertex shader:

defaultVertex = [[
    out vec3 FragmentPos;
    out vec3 Normal;

    vec4 position(mat4 projection, mat4 transform, vec4 vertex) 
        Normal = lovrNormal * lovrNormalMatrix;
        FragmentPos = vec3(lovrModel * vertex);
        return projection * transform * vertex; 

out is a keyword that simply passes the variable along to the fragment shader when the vertex shader is done. Doing this allows us to use the fragment position in world space and the vertex's normal to calculate our lighting changes. 

[ Special note: Casting and converting vec3 and vec4 can be annoying. Luckily, GLSL makes this easy by allowing a special .xyz method on vec4 variables that will do this for us, e.g. we could have done: FragmentPos = (lovrModel * vertex).xyz instead and it would perform the same. ]

In LÖVR, lovrNormal is defined as the vertex's normal, if one exists. Easy - already calculated for us! The reason why we multiply it by lovrNormalMatrix is so that we can get the normals applied to the model's transform - i.e. the position and rotation of the model as well. 

FragmentPos is less self-explanatory, but what we need to know is that this represents the xyz component of the current vertex of the currently being rendered model (of type lovrModel). In other words, a single visible point on the model. 

Now the important part, using that data on our fragment shader:
defaultFragment = [[
    uniform vec4 ambience;
    uniform vec4 liteColor;
    uniform vec3 lightPos;

    in vec3 Normal;
    in vec3 FragmentPos;
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
        vec3 norm = normalize(Normal);
        vec3 lightDir = normalize(lightPos - FragmentPos);
        float diff = max(dot(norm, lightDir), 0.0);
        vec4 diffuse = diff * liteColor;
        vec4 baseColor = graphicsColor * texture(image, uv);            
        return baseColor * (ambience + diffuse);
shader:send('liteColor', {1.0, 1.0, 1.0, 1.0})
shader:send('lightPos', {2.0, 5.0, 0.0})

The math and reasoning for this is explained in the LearnOpenGL tutorial, so here's the important bits for LÖVR:

- liteColor is a new uniform vec4, of values RGBA, that represents the individual light's emissive color
- lightPos is the position in world space the individual light emits light from 
- in is used here to indicate the variables we want from the vertex shader
- normalize() is an OpenGL function to make operations like this easier
- we are now returning the baseColor of the fragment times ambience PLUS diffuse - be sure these are added, not multiplied together

If you compile and run now, you should notice a bright light illuminating your scene. Experiment with variables and using the 'send' method (shader:send('liteColor', <new color table>) or shader:send('lightPos', <new position>)) in your draw() loops.

Almost there!!

Phase Three

Specular lighting does the least changes to individual pixels, but amounts to the most detail. For this implementation, we will be using view space, i.e. x y z of 0, 0, 0, for ease of calculation. If you read the accompanying tutorial, you know that performing these calculations in world space is more realistic. I'm sure you can think of games that use view space calculations -- ones in which the specular light reflections sort of followed your eyes as you moved. Now you know why!

We don't need to make any changes to the vertex shader, so here's the final fragment shader:

defaultFragment = [[
    uniform vec4 ambience;

    uniform vec4 liteColor;
    uniform vec3 lightPos;
    in vec3 Normal;
    in vec3 FragmentPos;

    uniform vec3 viewPos;
    uniform float specularStrength;
    uniform int metallic;
    vec4 color(vec4 graphicsColor, sampler2D image, vec2 uv) 
        vec3 norm = normalize(Normal);
        vec3 lightDir = normalize(lightPos - FragmentPos);
        float diff = max(dot(norm, lightDir), 0.0);
        vec4 diffuse = diff * liteColor;
        vec3 viewDir = normalize(viewPos - FragmentPos);
        vec3 reflectDir = reflect(-lightDir, norm);
        float spec = pow(max(dot(viewDir, reflectDir), 0.0), metallic);
        vec4 specular = specularStrength * spec * liteColor;
        vec4 baseColor = graphicsColor * texture(image, uv);            
        return baseColor * (ambience + diffuse + specular);
shader:send('liteColor', {1.0, 1.0, 1.0, 1.0})
shader:send('lightPos', {2.0, 5.0, 0.0})
shader:send('ambience', {0.1, 0.1, 0.1, 1.0})
shader:send('specularStrength', 0.5)
shader:send('metallic', 32.0)
shader:send('viewPos', {0.0, 0.0, 0.0})

viewPos at (0, 0, 0) is fine for a static camera, but we're doing VR, after all! If you have a headset connected, feel free to add this in lovr.update:

function lovr.update(dT)
    if lovr.headset then 
        hx, hy, hz = lovr.headset.getPosition()
        shader:send('viewPos', { hx, hy, hz } )

[ Special Note 2: The viewing position (not as much angle) is very important for the effectiveness of specular light. If you move the camera with the WASD keys in the desktop version of lovr (as in, you are running without a headset) then the lighting effect won't look very good. For testing without a headset, in this example, it's best to keep the camera in one position, and rotate it. ]

specularStrength is the 'harshness' of the light. This generally amounts to how sharp or bright the light's reflection can look.

metallic is the metallic exponent as shown in the LearnOpenGL tutorial. This value should probably range from 4-256, but 32 is fine for most things. 

The rest of the math hasn't changed - we're just adding the specular value to the final fragment color. 

And that's it! With any luck, you'll have a properly-lit model like so (lightPos at 2.0, 5.0, 0.0):

There's lots of playing around you can do - experiment with multiple lights, new shaders that are variants on the theme, and explore GLSL. 

[ Special Note 3: For factorization purposes, you can keep the vertex and fragment shader code in seperate files (default extension for them is .vs and .fs). You can use the command to load them in as strings just like above. The advantage of this is using syntax highlighting or linting when coding your shaders i.e. in VS Code. ]


This will work on your Quest or Go as well if you follow the instructions on the LÖVR website for deploying to Android. I added a moving, unlit sphere in the example to represent the light source to better visualize it.

Final Note: If you are having issues with some faces on your models not being lit properly, there are a few things to check on your model. 
-First, make sure it is built with a uniform scale. This can easily be done in Blender by selecting a properly scaled piece, then A to select the entire model, then Cmd+A (Apply) -> Scale. There is also the uniformScale shader flag, which gives a small speed boost - you should be developing everything in uniform scale in VR anyway!
-Second, all model faces need to be facing the correct way to generate their normal properly for lighting. If you notice some parts of your model are shading in the opposite direction, you can flip the face direction in Blender by selecting it all in edit mode, then Opt+N > Recalculate Normals or Flip Normals. 
These two tips should fix 90% of any issues! ]

Have fun with LÖVR!

Friday, May 15, 2020

Love2D - Simple event stacks with Lua

Love2D and its 3D/VR companion LOVR are great. I won't blab about how awesome they are - though having an entirely open framework means certain things must be built from scratch. One such thing is an event handling system.

Engines like Unity use a class inheritance to handle this. Every object in a scene is a GameObject, which has an inherited update method to process itself every frame.

It's possible to do this without much trouble by architecting all of your game entities in a similarly OOP way, but this isn't always intuitive, and can cause unnecessary headache and overhead if your game isn't overly complex, or you want more manual control over your event stacks.

Here's a super simple event stack example using anonymous functions and an event stack table (named 'queue'):

table.insert(queue, function() <code> end)

The most frequent use would likely be to add a global wait in between code blocks:

table.insert(queue, function() wait = 1 end)

This also makes calling functions with parameters and so forth very simple:

table.insert(queue, function() 
        ComplexFunction(a, 'b', { c = 0 }) 

Then in update:

    if wait > 0 then
        wait = wait - TimeDelta
        love.draw() -- Continue to draw, but don't process stack
    if #queue > 0 then
        if type(queue[1])=='function' then
            local f = queue[1]
            table.remove(queue, 1)

This code is the basis for most of the animation in my game, or when there needs to be a timed wait e.g. to suspend input tracked by variable named inputEnabled for one second:

function q(o) table.insert(queue, o) end
function setinput(tf) inputEnabled = tf end
q(function() setinput(false) end)
q(function() wait = 1 end)
q(function() setinput(true) end)

Lua allows lots of room for freedom in styling your code however you wish.

Sunday, March 8, 2020

Multi-cart data storage on Pico-8

If you've played with Lexaloffle's Pico-8 for a little while, the limitations of the cart storage - not for graphics or sound, but for code and raw data (esp. tokens) - become a bottleneck very quickly.

Multiple cart support has been added to emulate a form of bank-switching, but it is implemented in a way that purposefully blocks your ability to write more code. The memory locations 0x4300 to around 0x6000 cannot be READ or WRITTEN - this is fairly illogical, because memory locations that cannot be either read or written can't really exist. 

You can, however, repurpose cartridge data to store byte data you create - you just have to know how to store it. The data in the cartridge is effectively hex strings in a specific order. Knowing this, we can write a quick tool to convert data we want to store into Pico-8's cartridge text format. 

We can then read it into the fairly large "user data" area of RAM at 0x4300 (in cartridge, this contains our code) and use it as we will. Loading takes a second, so you probably want to load in as much data as you can at once (i.e. entire towns, etc).

You can programatically store all sorts of data, and use your original cart as a sort of kernel. It will certainly be tricky, and games still won't be EXTREMELY complicated (as is the point of the engine), but having more storage is KEY to making complete games!

As a test, I wrote a text file (i.e. ascii-encoded string bytes) and, using a quick Python script, I converted it to a Pico-8 cart.

Pico-8 Cartridge Text Format:

pico-8 cartridge //
version 18
--Data stored here is inaccessible from the main cart.
--Use this area to describe the stored data instead.
--Data stored here begins at 0x0000 and goes to 0x1fff. 
--It is stored in .p8 as a BACKWARDS hex string, 128 chars by 128 rows.
--e.g. HELLO = 8454c4c4f4 
--Data stored here is from 0x3000 to 0x30ff.
--Its format is the same as the gfx section.
--Data here is 0x2000 to 0x2fff
--It is stored as a normal hex string, 256 chars by 32 rows.
--e.g. HELLO = 48454c4c4f

The three sections above will give you 12,543 bytes of storage per cart, less if you use them for actual graphics and maps. Multiply that by 15 possible storage banks gives you 1.8 megabytes of non-standard storage, and that doesn't include sfx and music!

As a note:
The __sfx__ and music blocks are less easy to make use of. A typical sfx test string looks like this within a .p8 file:
But when you peek the first 10 bytes of SFX ROM @ 0x3200, the values returned are:
63 10 63 10 63 10 63 10 63 10
3f corresponds to 63, then there are 3 characters in between (050) that equal 10 in decimal. Storing and retrieving data from a format like this may be too inefficient or impractical.

In Python, converting byte data to a hex string is fairly easy:

file = open("input.bin", 'rb') # Data to convert
by =               # Read all at once
file.close()                   # Close i/o stream
bstr = hex(by[0])              # First byte to hex string
byh = bstr[2]                  
byl = bstr[3]
outbyte = byl + byh            # Rearrange the characters

Iterate the above and paste it into a cart file - then by reading location 0x0000 of the new file (if located under __gfx__), you can convert to string data and print it:

The base cart just does this:

for i=0,250 do
 if c=='\\' then
 elseif c~=nil then

(chr() function is defined in the link above). The if block converts any backslash found in the data to a newline character. 

The peek and poke in the screenshot show that the string is actually living in user RAM.

My python tool is very messy (as mine always are!) but it will generate a full cartridge file, warn you if your input data is too large, and fill out all rows to the proper length. You can check out the source here.

Tuesday, February 11, 2020

ZX Spectrum: Detecting in assembly 48k or 128k model

Detecting machine capabilities is just a matter of course in the MSX world. However, in Spectrum land, there wasn't much crossover with 48 and 128k games. Many just came on seperate tapes (or seperate sides of the tapes) and did not share code.

Some cleverly programmed ones, like Avenger, could detect and run the proper loader.

I tried to disassemble Avenger, but either the dump was bad (it wouldn't load the 128k version) or it uses some trickery I couldn't read. Either way I gave up and searched for my own way.

I couldn't find any discussion on this topic on the net, so I was left to my own devices. I came to realize one clear benchmark for 128 machines is the AY chip. As far as I can tell, no 48k machines had one, and every 128k machine did. Perfect!

Well, I tried a routine that polled the AY I/O port, but it doesn't seem to work. What I did not know is that unbound I/O ports will return floating values - about half the time it returns the value you're checking it against. This makes for very unreliable testing.

The other option is memory paging. I THINK this is what Avenger does - it definitely changes the ROM page to the 48k ROM. I did the following instead:

1. Switch the ROM to page 0 - this is never the 48K ROM on any system, and this code will do nothing on a 48K.

2. Read a byte from the ROM I know is only in 48K - The letter "1" from the string "(C) 1982 ..." should work. There is only one version of the 48K ROM, so unless there's something wrong with the system or emulator, this location in RAM (0x153b) should ONLY return '1' on a 48K system.

3. Compare against 0x31 ("1"), and if it differs, we must be on a NON-48K system. In other words, a 128K system (or a 16K, but hopefully nobody will try to run a 48/128 game on a 16K system).

The code looks like this:

As a side note, a secondary check if you REALLY want to make sure you're not on a 16K should be fairly trivial - just find a string byte that is only in that ROM.

Since I can't find any info on this subject, anyone more knowledgeable is welcome to provide alternate solutions - but for now I like this one.

Side note, the gorgeous color scheme is Cobalt in gedit plus the z80 highlight scheme I found on (install it to a -3.0 folder, not 2.0 like the Readme says).

Saturday, February 8, 2020

The super annoying Speccy VRAM map and pattern printing

The common way to explain the layout of the ZX Spectrum's pixel orientation on its bitmapped VRAM is often quite convoluted and is oriented towards the values of each bit of the VRAM address - useful for plotting single pixels, but not for batch operations.

The Speccy VRAM can be visualized in a few ways to help understand how it's laid out:

1) Similar to an MSX, the ZX has 3 sets of 256x8x8 blocks arranged in a 32x24 grid. From $4000-$47ff is the first set, $4800-$4fff is the second, and $5000-$57ff is the third.

2) Pixel data is oriented in VRAM as if it were a 2048x24 bitmap (with each byte representing 8 pixels for 256x24 bytes), then the 8x8 tiles were scrunched into 256x192.

Add 1 to H, every 8 add 32 to L and reset H.
  (if L rolls over, add 8 to H.)
Add 1 to L.

This layout can do a couple things with the target VRAM address:

1. inc l will increase the pixel X position across 8 rows (256 bytes per page / 32 columns = 8 rows)
2. inc h will increase the pixel Y position within the first 8 rows, plus the row offset from the l register.
3. Flooding VRAM with patterns is really easy and fast:

    ld hl, $4000    ; VRAM base
    ld b, 12        ; 2 rows per loop * 12 = 24 rows


    ld a, %01010101  ; pixel pattern row 1
    ld [hl], a       
    inc l             
    jr nz, .loop_a   

    inc h            
    ld a, %10101010  ; pixel pattern row 2
    ld [hl], a
    inc l 
    jr nz, .loop_b

    inc h           

    dec b
    jr nz, .printloop

ZX Spectrum: 1942 loader detokening and .TAP format assembly

.TAP format is much easier to work with than .TZX, which seems to mainly be for duplication.

.TAP structure is simply a series of headers and file data to create a file listing. Header-data, header-data, header-data. A header is always 19 bytes long, but the length of the data block can be up to 64k.

Visualized, it looks like this:
|   TAP block header   |
|                      |
|    Header data       |
|       (19 bytes)     |
|                      |
|---<Checksum byte>----|
|   TAP block header   |
|                      |
~     Data bytes       ~
|                      |
|                      |
|---<Checksum byte>----|
for each file on a tape.

Each header and data block has its own 3-byte mini-header as specified by the .TAP format. It's very simple:

; 3 byte block header:
DW BlockSize
DB BlockType
; then data
;  (...) followed by 
DB ChecksumByte

If the BlockType is 0x00 (indicating a header block), then BlockSize will always be 19 (in sequence 13h 00h). Headers are 17 bytes long, and the BlockType and ChecksumByte are added to the BlockSize length to get 19. 
If BlockType is 0xFF (255 or -1, indicating a data block), then BlockSize is the size of the data block (plus two bytes for BlockType and ChecksumByte).

Header blocks look like this:

DB FileType
DW DataSize
DW Parameter1
DW Parameter2

FileType can be 0, 1, 2 or 3. BASIC data can be stored as types 1 or 2, but we are concerned with type 0 -- BASIC program -- and type 3 -- CODE (aka assembly).

The filename must be padded with 0x20 to 10 bytes.

DataSize is the size of the data to load. All I know is that this value is generally 2 less than the BlockSize in the following data header.

When FileType is BASIC program (0):
Parameter1 is the LINE parameter when SAVEing the program. I actually have not gotten this to work, and since 1942 keeps it at 0, I do as well.
Parameter2 is the location of the start of the working area of BASIC variables. A bootloader generally does not have variables, so in these cases, this value is actually the same as BlockSize (or, DataSize+2).

When FileType is CODE (3):
Parameter1 is the target memory address (e.g. the first parameter after CODE), and
Parameter2 is ALWAYS 8000h. Not explained why.

And finally, the checksum byte. This isn't a checksum per se, so much as it is a bit toggling of all the bytes in the block, minus the header (including FileType). Start with the FileType flag byte and xor it with each successive byte, then store the final result in ChecksumByte.

For the data block, this is calculated for me using a Python script post-assembly with the following code:

chk = 0xff  # start with flag byte
i = -1      # which is one byte behind
while i < len(inbytes)-1:
    chk = chk ^ inbytes[i+1]
    i += 1

And of course data blocks are simply raw data.

The trick was getting a BASIC stub to auto-run when you play the tape (harder than it seems when you're doing all the bytes by hand) and have that stub clear RAM and load/run the assembly program we want.

I couldn't figure out how to save a .TAP from the "speccy" emulator, so I had no choice but to open up a .TAP of 1942 and see what was up.

The first TAP header block in 1942 looks like this:
DW $0013          ; size in bytes
DB 0              ; type 0 = header
; then the header data:
DB 0              ; 0 = BASIC program
DS "1942      "   ; filename
DW 185            ; file size
DW 0              ; autostart line
DW 185            ; basic vars loc
DB $0f            ; checksum byte

Then, the data block. This is where I had to detokenize the program by hand, and figure some stuff out for myself.

The listing of the 1942 loader ended up looking like this:
10 BORDER 0:POKE 23624,0:POKE 23693,0:CLEAR 25592:POKE 23739,111 
40 POKE 23739,244:RANDOMIZE USR 25593
50 REM etc

1942 loads itself into $63f9 - contended memory, but a good starting point all the same.
As a point of interest, 23624 ($5c48) is BRDCLR, 23693 ($5c8d) is ATTR_P, and 23739 ($5cbb) is CURCHL. These correspond to a border color mirror, an attribute byte I need to investigate, and the currently selected IO channel.
This, along with the .TAP disassembly, was enough to get me started -- the basics are use CLEAR n-1, LOAD "" CODE, and RANDOMIZE USR n.

(Note that the best way to check the value of a token in any native BASIC version is to use PRINT CHR$(n). BASIC tokens don't overlap the standard ASCII byte space, so n is almost always > 127.)

First, explaining the Spectrum BASIC line format:
DW LineNo        ; Big-endian!
DW LineOffset    ; Bytes until next LineNo
( ... )          ; (listing)
DB $0d           ; endline

The important thing here is that the single 0x0d byte represents endline in Spectrum BASIC. ZX80/81 use a different endline (0x76, maybe?). Don't look for 00 00 as endline or 00 00 00 for EOF like on other systems - afaict there is no concept of EOF in ZX BASIC.

A large difference between Sinclair and other BASICs is that Sinclair wastes a ton of space on storing numbers as strings, but condenses all spaces automatically. Here is the hex listing for my very short loader program:

13 00 00 00 4C 4F 41 44 45 52 20 20 20 20 2C 00
00 00 2C 00 11 2E 00 FF 00 0A 0D 00 FD 32 35 35
39 32 0E 00 00 F8 63 00 0D 00 14 05 00 EF 22 22
AF 0D 00 1E 0E 00 F9 C0 32 35 35 39 33 0E 00 00
F9 63 00 0D 70

And the corresponding BASIC:

10 CLEAR 32767

CLEAR 32767 This sets BASIC's HIMEM to 7fffh. Doing this tells the ZX that the next time we load bytes from tape, they should go to the byte after this address (8000h).
LOAD "" CODE This is equivalent to "Load the next file available from tape as an assembly program (to the lowest point in memory I've allotted)". This will load the next chunk of data pointed to by a .TAP header in the .TAP file as a binary to 8000h.
RANDOMIZE USR 32768 This is, for some reason, the common way to start machine language routines on the speccy. This is equivalent to "JP $8000".

The tricky part here is that immediately after string numerical constants (which are stored as ASCII), they are followed with byte 0x0e (integral modifier byte) and then stored as:
DB 0
DB PolarityByte
DW IntValue
DB 0
Such that 25592 becomes 11 bytes(!!):
32 35 35 39 32 0e 00 00 f8 63 00

Also, the final line, RANDOMIZE USR n, is the sequence of bytes f9 c0. Whenever you see this in Spectrum BASIC its a CALL/JP command.

As mentioned above, I wrote a Python script to calculate the checksum bytes for me. Running it on the BASIC stub binary and a compiled asm binary I resulted in two .TAP files: one for the BASIC loader, and one for the hello world program.

.TAP is brilliant because you can $cat a.tap b.tap > ./c.tap and suddenly have a complete tape file. I tested b.tap with this binary code, assembling as-is with nothing else:

%org $8000

    xor a 
    ld [WorkRAM_a], a
    ld a, [WorkRAM_a]
    inc a 
    cp 8
    jr nz, .ok
     xor a
    ld [WorkRAM_a], a
    out [ZX_IOPORT], a
    jp Loop

WorkRAM_a: rb 1

And it worked! With a little bit more work and bash nonsense, I have a one-click script that will assemble a ZX Spectrum .TAP image (with loader!) to an address I specify from a single assembly listing.

Comment if you are interested in learning more.