2ine: 16-bit .exe support!
Oh, OS/2, I wish I knew how to quit you.

Look, you have to know what motivates me. You can offer me money, you can offer me fast cars and fast women and fast SIMD instruction sets, but what really gets me moving is flattery.

So when neozeed wrote up this great post about my work on 2ine, it made me want to work on it more. Notably, though, neo mentions that an attempt to run a 16-bit binary failed outright.

Attempting to run anything 16bit or LE will give you:
    ./lx_loader CL.EXE
   not an OS/2 LX module

Guilty! That error message is accurate, as 16-bit OS/2 .exe files are not LX modules. LX ("Linear eXecutable") was a new thing in OS/2 2.0, with the advent of their 32-bit support. Every .exe for OS/2 1.x, however, is an NE ("New Executable") module.

Heard that name before? Perhaps you've met the platform that made them famous. It's called Windows 1.0.

This is more mostly-similar design that fell out of the Microsoft/IBM divorce. NE files are basically the same between platforms, minus a field that says what platform the binary targets (1 for OS/2, 2 for Win16, etc) and some other extras that grew organically out of a nasty cocktail of necessity and lack of cooperation, as these sort of things go.

Both NE and LX files (and PE--the Win32 format--too) have a standard MS-DOS executable header at the start, mostly used to let DOS print "This program requires OS/2" to stdout and quit, but these actually function as fat binaries. You could literally build a program for both OS/2 and DOS, glue them together, and whatever OS was on your machine would run the correct part of the .exe. At the time, it was dark sorcery to me, but ~15 years later Apple made it so commonplace that it went from fascinating to expected: demanded, even. Unless you're on Linux, I guess. 

So I dove into this on 2ine, to see if I could make not just an old OS/2 binary work, but a legitimately ancient one. 

My theory (and desperate hope) was that I had already solved all the really hard problems; 32-bit OS/2 binaries often call into 16-bit code contained in their own processes, including system APIs that were never made 32-bit clean. It's a wild roller coaster ride that I covered over here. I prayed that I could just get 2ine to understand the older executable file format, and then it would slot right into what I've already written, merely never calling into any 32-bit code directly, since these programs don't know that an extravagance like 32 whole bits even exists.

Writing the loader was straightforward enough; Microsoft was kind enough to document the NE format in their Windows SDKs, and OS/2's version is just a subset of it.

While LX gives you a list of memory blocks to load, NE deals with segments because this is how 16-bit x86 thinks about memory. Instead of a memory address like 0x88001412 (in decimal: the 2281706515th byte in your address space, zero being the first), you got segments and offsets. Your address might look like 484a:0012 (the 19th byte in segment 0x484a). Each segment had 64 kilobytes, and when you added one to the final offset of the segment, you went back to 0. Address 0001:ffff and 0002:0000 were not adjacent, so all the work you wanted to do in memory had to fit in a single segment, or it had to do an enormous amount of tapdancing to deal with multiple unconnected segments.

For example, this is the source code to Wolfenstein 3D's two completely different, hand-coded assembly routines for huffman decoding...one for when you have less than 64k of data, and one for when you need to jump between memory segments.

For completeness, this code is where Wolfenstein 3D is unable to write more than 64k to a file at a time. Segmented memory was the bitch of living in the early 90's, so it should be no surprise that id's next game, DOOM, used DOS-4GW to get a linear address space on DOS. Indeed, part of the reason I jumped to OS/2 in the first place was a magazine article I read about loading a .bmp file ("just malloc() as much memory as you need, even if it's more than 64k!!"  ...I felt so scandalized, but I lusted for it all the same).

So loading an NE file is done one segment at a time, and since modern computing is an embarrassment of riches, I just allocate the entire 64k each time instead of the smaller requested fraction, since this is easier for me, hopefully protects against some inevitable buffer overflows that will never be patched, and honestly? The limited resource in modern times isn't bytes of RAM but 16-bit selectors (you only have 8192 of them in any case, and once they're gone, no segments for you).

Once I had the program mapped into memory and fixed up, and assigned selectors to every segment, I discovered that all the existing APIs that I had already implemented were meaningless. Even the simple ones like DosWrite() had a different calling convention, and different argument types (most of the APIs that took LONGs took SHORTs in OS/2 1.x, etc).

I had already spent a lot of time on the 16-bit bridge code, though, and previously fearing that the complexity would overwhelm me even before every API needed it, I automated it. Now adding an API is just a matter of sticking this in a header...


...which is a mouthful, sure, but a perl script will parse that and generate all the real macro salsa, like this stuff. So now the 16-bit app calls into a real 32-bit ELF library that converts as appropriate and calls into 32-bit C code, hiding all the magic (and mess) from me; I just write C code with a linear address space for any function and don't care about the memory politics.

This and a script to tell me the libraries and symbols that a given .exe uses made it pretty quick to start filling in the missing bits.

Right now, this is juuuuust enough code to run a Watcom C-compiled program that wants to use printf() and read its command line/environment, like this guy. It can't run Microsoft C 5.1 yet, as neozeed's blog post attempted. It also won't work with 16-bit DLLs yet, since I don't have it calling into the DLL init entry point, but that work is easy enough now.

I'm getting pretty good at diagnosing problems from disassembled binaries from the 1980's with a debugger that barely limps through this stuff and a lot of psychic debugging. This feels like a marketable skill but almost certainly isn't. Like before, this is a good stopping point on the project for now. Back to porting games to Linux!  :)

(The NE loader is in this commit in the increasingly-incorrectly-named lx_loader.c. The next few commits improve it and start filling in 16-bit APIs. More to come at some later time.)