Thursday, September 12, 2024

PROPERLY reading keyboard input in DOS environments - IRQ Hook

 There are by my estimation THREE different ways of reading keyboard input in DOS, using C. 

The first is getch().

// getch
#include <iostream.h>
#include <stdio.h>
#include <conio.h>
/* this does not give keyup event */
void main(int argn, char** argc){
int ch;
while(ch != 1){
ch = getch();
switch (ch)
{
case 0:
case 0xE0: /* extended key */
switch (ch = getch())
{
case 0x48: cout << "up arrow\n"; break;
case 0x4B: cout << "left arrow\n"; break;
case 0x4D: cout << "right arrow\n"; break;
case 0x50: cout << "down arrow\n"; break;
default: cout << "extended key " << ch << "\n";
}
break;
case 9: cout << "backspace\n"; break;
case 0xd: cout << "return\n"; break;
case ' ': cout << "space\n"; break;
default: cout << "normal key " << (char)ch << "\n";
}
}
}


From what I can tell
, most compilers interpret this function as the DOS BIOS' 0h INT16 call (which is get key from buffer). The problem with this is the usual - you must deal with the operating system's manipulation of the keyboard buffer to give you keys read; which is simply one interpretation of the keyboard's state. You can modify the terminal / TTY settings of stdin/stdout, but at the end it is not appropriate for games. getch() does not return any keyup events, as these are handled on the OS layer, so you will never get a value from getch() with bit 7 set (presumably).

The second is reading the keyboard's I/O ports directly:

// Be careful: without an interrupt, key events can be missed
unsigned char read_key();
// 8255 PPI
// After getting state, you must flip-flop the PB
// in 0x61, && 0b1xxxxxxx, out 0x61
// reset 0b0xxxxxxx, out 0x61
#pragma aux read_key = \
"in al, 0x60"\
"mov ah, al" \
"in al, 0x61"\
"or al, 0x80"\
"out 0x61, al"\
"xor al, 0x80"\
"out 0x61, al"\
value [ah] \
modify [ al ];

The 8255 PPI is legacy-emulated on pretty much every keyboard nowadays, by the way! USB converts its own signal to PS/2 which is by definition compatible with IBM/PC.

Anyway, this assembly works great! And it's super fast!

Unfortunately, as you can see by the first line comment, it is possible (and happens frequently) that if you are not constantly monitoring the keyboard, you will miss a keyup event. This is fine for probably 80% of purposes, but for action games in particular, a stuck key means death.

I tried a large variety of things to get around this limitation, but unfortunately, the only thing that works is clearing out every other "pressed" key state you have in memory when a new key is pressed. The code may be simple, but no two keys can be pressed at a time! Argh!!

 The third method is using the DOS INT16 0h and 1h calls to check for and retrieve keys from the system's keyboard buffer. This isn't very difficult to implement, so I won't waste time on it here, but the effect is nearly identical to default getch() and stdin. You don't get super fast access to the PPI and you can still miss a keyup event.

After a bunch of hemming and hawing, it was time to do more research.

So, operating systems don't "scan" a keyboard the way older micro-computers did. On virtually every 8-bit system I've programmed for, you can get the state of any key on the keyboard at any time - with exceptions, depending on how things are wired.

What is done nowadays is the keyboard is hooked into a hardware interrupt. And unfortunately, I seemed to be in a situation where that was the only option of getting rid of keyboard input bugs. 

So I did my research, but wouldn't you know it, that kind of thing doesn't have a lot of information. There are some StackOverflow posts here and there, but nothing particularly clear and definitive. There wasn't even a clear list of where the interrupt vector address was!

After some frustration and tweaking of my search keywords, I discovered this wonderful post, which had some information I couldn't find elsewhere:

    1. The DOS keyboard interrupt vector is 09h
    2. Watcom C/++ has a getvect() and setvect() macro, which took me to being able to find out that the vector table for MS-DOS is 255 entries of two words, one containing the segment address and one the instruction pointer (code offset). This means 0024h holds the vector address!

Well, hold on. Trying to write the address of my replacer function to 0024h directly isn't working, even with interrupts off. So, I did what every sane person does: Disassembled a Watcom produced .exe that uses the setvect function to see what it does!

                                                           FUN_1000_0008:1000:0082(c)  
       1000:0174 52              PUSH       DX
       1000:0175 89 da           MOV        DX,BX
       1000:0177 1e              PUSH       DS
       1000:0178 8e d9           MOV        DS,CX
       1000:017a b4 25           MOV        AH,0x25
       1000:017c cd 21           INT        0x21
       1000:017e 1f              POP        DS
       1000:017f 5a              POP        DX
       1000:0180 c3              RET

Oh - they don't do anything at all. They use the DOS service routine 25h to do it for them! No wonder. 

So, copying this method, we can PROPERLY replace their IRQ service routine with our own, based on the post above, which will read every key's scancode into a 0 or 1 buffer.

Marking it as __interrupt will (we hope!) cause it to save all registers properly. We only need it to flip a bit in the char keys array, so I simplified it down a bit and removed almost all of the compiler-dependant code and wrote ASM macros for Watcom. 

In the end, the final, interrupt-based, flawless method looks like this: 

#include <iostream.h> // cout
#include <graph.h> // settextposition and clearscreen

typedef unsigned char u8;
typedef unsigned char bool;
typedef unsigned short u16;

// Entire keyboard scan is held here, including extended keys:
u8 keys[192];

void far * old_kb;
static void interrupt kb_int();

void write_port(u8 p, u8 v);
#pragma aux write_port = \
"mov dh, 0"\
"out dx, al"\
parm [dl] [al]\
modify [dx];

u8 read_port(u8 p);
#pragma aux read_port = \
"mov dh, 0"\
"in al, dx"\
parm [dl]\
value [al];

void set_irq_vector(u8 iq, u16 segment, u16 offset);
#pragma aux set_irq_vector = \
"push ds" \
"mov ah, 0x25"\
"mov ds, bx"\
"int 0x21"\
"pop ds"\
parm [al] [bx] [dx]\
modify [ah];

void far * get_irq_vector(u8 iq);
#pragma aux get_irq_vector = \
"mov ah, 0x35"\
"int 0x21"\
parm [al] \
modify [ah] \
value [es bx];

#define segment(a) ((u16)((unsigned long)(void __far*)(a) >> 16))
#define offset(a) (u16)(a) // dont need to & 0xffff because casting down does this

int main() {

old_kb = get_irq_vector(9); // save old vector
set_irq_vector(9, segment(kb_int), offset(kb_int));
// Key display loop taken from sample code
_clearscreen(0);
while(!keys[1]) { // normal key 1 == ESC
for(int y = 1; y < 5; y++){
int i;
_settextposition(y, 0); // y, x
for (i = 0; i < 0x30; i++) {
cout << (int)keys[i + ((y-1)*0x30)];
}
}
}
//
set_irq_vector(9, segment(old_kb), offset(old_kb));
return 0;
}

static void interrupt kb_int() {
static u8 buffer;
u8 inc = read_port(0x60); /* get byte from port 60h */
bool on_off = !(inc & 0x80); /* bit 7: 0 = on, 1 = off */
u8 scancode = inc & 0x7F; // bits 0-6 are the key

// First, we check our last buffer to see if it has an E0 byte...
if (buffer == 0xE0) { // was our last byte E0?
if (scancode < 0x60) // if so, extended key
keys[scancode + 0x60] = on_off;
buffer = 0;
} else if (buffer >= 0xE1 && buffer <= 0xE2) {
buffer = 0; /* ingore these cases... */
} else if (inc >= 0xE0 && inc <= 0xE2) {
buffer = inc; // store it in static var for next loop!
} else if (scancode < 0x60) {
keys[scancode] = on_off;
}

write_port(0x20, 0x20); // ack IRQ, needed
}

Three full days of work! Phew! (And I didn't even do the hard part myself!)

Astute viewers may have noticed the static buffer char inside the static kb_init() function. This is explicitly to save the previous interrupt's E0, E1 or E2 character in the case of an EXTENDED character code. All we need to do is check this buffer char *first* to see if we are an extended key scancode or not. If the buffer HAS one of those Ex keys, we clear for the next key. 

This looks longer than it should be because of the assembly macros, but it is quite simple if you take the time to look at it. 

Done! This will get the immediate state of every key on the keyboard, through a properly handled interrupt - perfect for game development!

If this helped you, or you have anything in reply, please leave a comment!




Friday, June 7, 2024

Coding in mixed Assembly and C for the PC-9801 in 2024

(Scroll down to the bottom to download the project)
Project contents:
     _LCC             : LSI-C makefile
    98header.s       : FDI header source
    checkexe.py      : Examine DOS-EXE and optionally relocate
    insert_to_fdi.py : Inserts a file to resolved cylinder/head/sector address of FDI                        file
    main.s           : ASM source for IPL/initial program loader
    Makefile         : GNU Make for this project
    test.c           : LSI-C source for TEST.EXE hello world


There isn’t a whole lot of options out there when it comes to compiling code for 16-bit real mode apps - what you need to make DOS mode games on the PC98!
It started with curiosity of how I would get started hacking Record of Lodoss War.
It started from this assembly hello world example I found online here: https://qiita.com/TakedaHiromasa/items/371503c48ac33237a859 , and became a rabbit hole on every single edge.

1. Researching Disk Formats and Boot Loading
2. Finding a C Compiler that works is hard
3. Research into Disk Coding
4. DOS-EXE Disassembly 

5. …And Trimming Everything Into Place

1. Researching Disk Formats and Boot Loading


Based on the assembly example, there was a functional disk header defined in data bytes included. This isn’t very helpful however, since the format it uses (TFD) is not used by any emulator I have. I COULD, though, compare it to the FDI header generated from a blank FDI disk thanks to barbecues tools https://github.com/barbeque/pc98-disk-tools/ . After some stumbling around NASM syntax, I realized that generating a FDI header was pretty easy:

                   DD 0
        DD 144
        DD 4096
        DD 1261568
        DD 1024
        DD 8
        DD 2
        DD 77

        times 0x1000 - ($-$$) db 0


This file I compiled and saved as header.bin and put aside.

Despite a few small syntax errors in the example, I mangled to compile and run it as an FDI.
At the end of the assembly file, simply adding

        times 0x134000-($-$$) db 0xbe


will fill out the rest of the bytes. Then, running

        cat header.bin prog.bin > out.fdi

will create a bootable fdi image.

With a fair bit of ease, I was able to determine vis-a-vis DOSBox-X debugger (command: “ev cs”) that the code segment after booting from a disk with anything on it (e.g. jp $) was at 1fc00h. With the “memdump 1fc0:0000 1000” command, I was able to determine that 1024 bytes were copied from the disk into segment 1fc0 and ran there. The rest of the bytes after 1024 were not 0xbe, telling me no more were loaded from disk.

This is as expected. The PC8801 copies the first sector from disk into memory and runs it as well - but on the 88, it’s only 256 bytes.

Great! On to the next thing - can I do C? And if so, can I interoperate with arbitrary asm?


2. Finding a C Compiler that works is hard


I tried to install gcc-ia16 (various platform failures), Borland for DOS (crashes on PC98), build OpenWatcom (failed for lack of lib support) AND run its DOS installer (hangs the PC98), all to no avail.

Feeling frustrated, I googled Cコンパイラ PC98 and found this Qiita page: https://qiita.com/mikuta0407/items/e659b0d5101464aba071 where they linked to an archived version of the LSI-C86 compiler toolchain!

This is the business. I quickly downloaded it, tested it, and uploaded it to archive.org. It works well, EXCEPT, after much and much fumbling I was unable to get the linker (ldd) to play nice with its own assembler created files (r86) exporting to COM. This was to simplify the interaction between ASM and C. Unfortunately, perhaps due to a bug, COM export failed.

I was getting EXEs though, and that’s great news! So maybe there is a way to run the executable file from assembly? Perhaps stripping the header and calling the code start point? How complicated could exe files be anyway? Famous last words.

I made a quick Hello World that writes the text VRAM directly and left the EXE for now.

Before diving into exes, I wanted to make sure that I would be able to load a file that existed in a random location on disk into general RAM. For that, I would need some documentation on the floppies…


3. Research into Disk Coding


Lucky for me, in my ramblings in search of info (memory maps, etc) I stumbled upon mention of the PC-98 Programmer’s Bible: https://archive.org/details/PC9801Bible/. I decided to check it out, and wow! Glad I did.
This contains information on every BIOS call, including disk I/o, so despite some trouble in converting the example C program back to ASM and way more time than I should have spent narrowing down  the formula for the location on disk, I got a disk writing and reading routine done in just a couple hours.

%macro DiskLoad 6
; Src C, Src H, Src S, Dst Seg, Dst Ofs, Byte Ct
    mov cx,3<<8 | %1 ; sector len, cylinder
    mov dx,%2<<8 | %3 ; head, start sector
    mov ax,%4 ; segment
    mov es,ax
    mov bp,%5 ; offs
    mov bx,%6 ; bytes
    mov ax,0x76<<8 | 0x90 ; load cmd
    int 1bh
%endmacro
%macro DiskWrite 6
; Src Seg, Src Ofs, Dst C, Dst H, Dst S, Byte Ct
    mov ax,%4 ; segment
    mov es,ax
    mov bp,%5 ; offs
    mov cx,3<<8 | %1 ; sector len, cylinder
    mov dx,%2<<8 | %3 ; head, start sector
    mov bx,%6 ; bytes
    mov ax,0x75<<8 | 0x90 ; write cmd
    int 1bh
%endmacro


*Of note, even if the byte count in bx is < len(sector), the remainder will be filled with 00 if writing to disk - precision reads may be possible but precision writes are not.

While I was doing it, I wrote a python script (and sent it over to barbeque) that will add a file to a given cylinder, head and sector of a FDI image.

The formula, given a header of 0x1000 size and variables cy, he, se, is:

    loc = (int(cy) * 0x4000) + (int(he) * 0x2000) + ((int(se)-1) * 0x400) + 0x1000

The se (sector) variable is subbed by 1 because it's range is 1 to 8 instead of 0 to 7.


Almost there: I needed to figure out how to load up these pesky DOS-EXE files.


4. DOS-EXE Disassembly 



It took me about 20 tries reading over http://justsolve.archiveteam.org/wiki/MS-DOS_EXE , http://www.textfiles.com/programming/FORMATS/exefs.pro , https://groups.google.com/g/microsoft.public.masm/c/T6mvLia40SE/m/50_TLyRgmyoJ and https://moddingwiki.shikadi.net/wiki/EXE_Format (all which contain slightly varying and slightly confusing explanations of the same thing) before I understood what *precisely* the EXE file was doing.

It may help to restate:
The EXE file can be loaded to an arbitrary memory address. Whatever that address segment is, must be added to all of the locations that are pointed to by the values in its relocation table. Every (code_start + table_val) location contains a relative offset / segment value that must be added into the load address.

SO. A relocation table of: 

01 00 00 00, 

34 00 00 00, 

17 00 00 00
Means that at (code_start + 01h), (code_start + 34h), and (code_start + 17h) there are values which must be incremented by LOAD_ADDRESS. Then, you can safely JP LOAD_ADDRESS.

That’s it! This also means if you know where you are going to execute the code from, given a static memory map, you can strip the header completely and adjust the relocation segment variables beforehand.


5. …And Trimming Everything Into Place


Which is exactly what I did. Another hour or so and I had a “quick” python script to do the following:
- scan the header,
- print the data on the table,
- trim the header and
- adjust the relocation values in the file.

After that, everything was ready… I just had to perform the magic ASM:

          DiskLoad 0,0,2,0x2000,0,1024*6
    call 2000h:0000h


and use my trusty Makefile to put it all together:

default:
      nasm 98header.s -o header.bin
      nasm main.s -o prog.bin
      cat header.bin prog.bin > app.fdi
      python3 checkexe.py TEST.EXE -r 0x2000
      python3 insert_to_fdi.py app.fdi TEST.EXE_ 0 0 2


… AND IT WORKS!

Something is actually as easy at it looks for once! What a surprise.

You can download the entire project here!


But you must have all the tools (LSI-C, NASM, GNU Make, Python3) yourself.