Friday, August 14, 2020

Super fast Raycast 3D engine in LOVE 2D

Source (with lightmap code): https://github.com/bferguson3/terasu

 (Update: I mistakenly thought during my experimentation that clearing the buffer is necessary each frame. It's not. The article has been updated below)

LOVE is great. Underneath the hood, it's just a Lua interface to render OpenGL textures to a window - which is simple and perfect for any 2D sort of game. 

Well, almost. Older game hardware let you access VRAM (and thus individual pixels) directly and more quickly than modern hardware. We have shaders, but what this allows us to do is GPU-side manipulation on graphics data we've already pushed to the video buffer - it doesn't let us construct data from a tile map, for example. The amount of memory available to the pixel shader is also fairly limited, so you can't use it to extrapolate entire images, for instance.

This means when making retro-style games in LOVE, which a lot of times requires manipulating an image as if it were VRAM or pixel data, can actually be unintuitive, because you're not just drawing layers of static images on top of each other. Every time you change the image, you have to recreate it before you redraw it. LOVE warns against this explicitly, because you can very quickly overflow and crash.

A perfect example of pixel-based rendering is a 'raycasting' engine, like Wolfenstein 3D, or other simulated 3D games like the original Elite which plotted pixels directly to video to draw wireframe models. These games were developed before 3D accelerator GPUs were a thing, so all rendering is done CPU side. 

Writing directly to memory is super fast, so Wolfenstein 3D could run pretty well on a 386. Unfortunately, in frameworks like LOVE which are built on top of Lua, on top of C, it's not so easy to write directly to memory. Lua tables are extrapolated, inferred, blown apart etc., and the performance hit can be obvious when you're polling several values from tables of tables a couple thousand times per frame, sixty times per second!

There is an absolutely amazing raycasting engine tutorial by lodev for C++ available on his website. If you know C++, you can likely recreate Wolfenstein 3D from scratch in just a couple days! I personally have never written a raycasting engine, and wanted to see if it was possible in LOVE. I went through it, and guess what? It works really well! Unfortunately, I hated the performance (60% or higher CPU usage), thanks to using so many tables.

To draw the image, LOVE has ImageData objects, where you can use the :setPixel() method, but this is slow. About as slow as using Lua tables, in fact, so we want to avoid using this. What we can do instead is treat the ImageData as a vram buffer - we 'draw' everything internally, then push it all at once using an appropriate data structure. In this case, we can use the :replacePixels() method once per frame instead - but we still suffer from the data format limitation.

So how do we work around that? Turns out it's more simple than you might think.

LuaJIT, the ultra speedy single-pass, "just-in-time" compiler for Lua that LOVE uses, offers a library called 'ffi'. The ffi library allows you to extend C. I won't talk about how great this is, but instead I offer this tiny piece of code that fixed all my problems:

local ffi = require 'ffi'
ffi.cdef[[
typedef struct { uint8_t r, g, b, a; } pixel;
]]

This defines for us a new usable C struct named pixel that has four components,
r, g, b, a which are each 8-bit integers. What may not be obvious is that once we initialize a variable
of type 'pixel' then it will be a byte-perfect representation (i.e. 4 sequential bytes) within our Lua program.

So this is how it's used:

screenBuffer = ffi.new("pixel[?]", screenWidth*screenHeight)
bufSize = ffi.sizeof(screenBuffer)
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
drawBuffer = love.graphics.newImage(drawData)
a
This code initializes a struct array of type "pixel" of ? elements where ? is the number you pass as the second
argument. ffi.sizeof() grabs the size in bytes of the object. We need this for the next line, which creates a new
LOVE ImageData object in the correct format (rgba8, or an 8-bit series of red, green, blue, and alpha for each
element). ffi.string() will coerce the parameter passed, screenBuffer, to a char * of size bufSize. Finally, the
drawBuffer image (what is actually put on the screen) is initialized from this ImageData.

Don't actually do this:

Don't forget to clear the screenBuffer every frame:
for i=0,(screenWidth*screenHeight) do
screenBuffer[i].r = 0
screenBuffer[i].g = 0
screenBuffer[i].b = 0
screenBuffer[i].a = 0
end
You don't need to clear the screen buffer if you're tracing the entire ceiling and floor every frame. Omitting
this will cut out a lot of cycles!

Then, whenwriting pixels to the screen, you do this instead:

local c = textureData[texNum][(ty*textureSize)+tx]
local r, g, b = c.r, c.g, c.b
local px = math.floor((y*screenWidth)+x)
screenBuffer[px].r = r
screenBuffer[px].g = g
screenBuffer[px].b = b
screenBuffer[px].a = 255

Where 'c' is 'color' in the original source linked above, and you write to the screenBuffer indexed linearly
instead of a two-dimensional array. Setting the color and alpha to 0 at the beginning of each frame is the
equivalent of clearing out the graphics buffer.

When converting the math from C++, be veeeery careful of order of operations (the example source is
very lazy) and of data types. Also be aware that LuaJIT, as mentioned above, is one-pass: what this means
in this case is that inline expressions are evaluated when they are encountered, and not before. So, for
expressions that are repeated, factor them out of loops as much as possible.

The other big thing to watch out for is expressions inside of array indexes (these MUST be cast to integers
with math.floor() or the math won't work) and variables that are declared as integers. As long as these are
truncated with math.floor() or some similar operation, then you'll be fine. But if you don't, you'll see results
like this and pull your hair out trying to figure out why:


Once you've done your draw code, you gotta create a new ImageData from the screen buffer byte string. This
is where it gets a little tricky:
drawData = love.image.newImageData(screenWidth, screenHeight, "rgba8",
ffi.string(screenBuffer, bufSize))
if drawData then
drawBuffer:replacePixels(drawData) end
drawData = nil

Remember drawBuffer is the LOVE type image. We only need drawBuffer at the end of our processing,
meaning drawData can be nil'ed out and garbage collected.

The purpose of doing this is unclear, so I'll try to explain. LuaJIT's garbage collection doesn't run super often,
and doesn't go super deep. This is to prevent impacting performance. Unfortunately, the use case of LOVE
where you're generating possibly upwards of 100mb of graphic data per frame WILL cause your game to
overflow and crash (at approx. 2GB).

Nil pointers should get cleaned up during garbage collection and free up that RAM, but we need it to run
faster than it is. Eventually this small app will take several hundreds of megabytes of memory, and we can
certainly trim that down. The solution is remarkably simple:

function love.update(dT)
secondCtr = secondCtr + dT
if secondCtr > 1.0 then
secondCtr = secondCtr - 1.0
collectgarbage()
end
...

collectgarbage() is a native Lua method that when called with no arguments will perform a full garbage
clean. If you call it as collectgarbage('count') you'll get the memory consumption in kilobytes of Lua (minus
the 50-60MB or so of RAM that LOVE takes up). If you like you can fine-tune this for minimal impact.

At the end of the day you should end up with a result like this that stays around 10-15% CPU and under
100 MB of RAM, depending on your graphics (I even added a sprite):

Source (with lightmaps): https://github.com/bferguson3/terasu

No comments:

Post a Comment