Difficult to tell from the gif, but it runs at 60fps and only shifts 1 pattern column at a time.
Behind the scenes is much cooler. In instantanous fashion, the data for the next room (700 some bytes) is loaded from a swapped memory page into RAM and cycled into VRAM as shown above.
9918 limitations:
4 sprites per line
1 color per sprite
29 T-states between VRAM writes (!!!)
Monsters are 2x2 patterns, and while there's space for 100 in RAM, there will likely never be that many on-screen at once, regardless, this and projectile flicker allows for 2 players to have 2 colors - and monsters are actually more colorful than sprites would be. The downside is they move in 8 pixel chunks.
The VRAM writes are a big problem. This VDP tutorial is very good, but can be misleading. In my experience, the 9918 VDP ALWAYS has a 29 t-state wait. On the MSX2, you still need a small wait in between reads and writes. You can use ldir "unlmited" during vblank, but be careful. Related, if you compare carefully timed VDP writes DURING vblank on openMSX and real hardware, you will see discrepencies. This is because emulation down to the microsecond is physically impossible. At any rate, when coding for the MSX1, the suggested method of:
outi
jp nz, outloop
Works very well (due to the exactness of the timing = 29 T-states).
When working with the 9938, the thing to keep in mind is that there is a MINIMUM time of 5-8 T-states between reads and writes. If you are polling VRAM:
ld hl, (vram_addr_to_read)
ld a, l
out (VDP_STATUS), a
ld a, h
out (VDP_STATUS), a
nop
nop ; do nothing for 8 cycles!
in a, (VDP_DATA)
Otherwise, you will get erroneous graphics.
Using a garden variety of shit (RLE encoding, 3.7kb music player, cartridge paging) I managed to squeeze the requirements for the game down to 8kB of RAM. It looks pretty good so far, I think, and runs on the most bare minimum of hardware.
Horrah!