On March, Forest of Illusion, a Twitter account designed for "Preserving Nintendo's History", posted a tweet about a SA-1 demonstration cartridge being available on eBay for acquiring. The cartridge was undumped at that time and shown two mysterious demos, which the first demo suggested a comparison between S-CPU (SNES CPU) and C-CPU (SA-1 CPU) and another demo that suggested a software SA-1 graphics rotation system.
My immediate reaction was this retweet w/ comment tweet, I was pretty sad that there was no dump available and no hopes of the cartridge getting acquired: after all, it was being sold by a pretty high amount (if I remember correctly, the initial price was around $4,200).
However.... I got the great news that one of my Patrons (which preferred to stay anonymous) acquired the cartridge and he sent it to Bob (from RetroRGB) to dump it and send a copy of the ROM for me. When I initially saw the DM Bob sent it to me... I couldn't believe it. The demo that I was looking for pretty much over 7 years, since the day I heard about its existence on SNES Central... And now it was finally dumped and I could analyze & play it! It's certainly one of most grateful surprises I ever got!
You can download the ROMs here. There is two ROMs available, 1) the untouched and original dump and 2) with added SA-1 header that makes it bootable on SNES emulators. For majority of the cases you will want the 2nd ROM.
A YouTube video is also available at https://www.youtube.com/watch?v=-en4NwcZVAI
The following paragraphs and sections tells how I got the ROM running (the 2nd ROM) and the many findings I found during the process. Hope you like!
Getting the ROM to work
The cartridge itself works perfectly, but we have to dump it for figuring out how the demo works, possible hidden/unknown features and of course, for preservation.
Initially, Bob thought the dump was incorrect; because it didn't boot when loading on an emulator (on this context, anything that is not a real SA-1 cartridge is an SA-1 emulator). However, after asking for the ROM and analyzing on a hexadecimal editor I realized that the contents were correct, but the internal SNES header information was not set up, more specifically there was no ROM registration data set up, so no emulator would recognize it correctly.
For who don't know, the only information required to boot up a SNES cartridge on the real hardware is the CPU vector information, available at $00:FFE0-$00:FFFF. The information between $00:FFC0-$00:FFDF (ROM registration data) is completely optional but all emulators need to read the data there to know how much SRAM is used (0 KB, 2 KB, 8 KB, etc.), what is the ROM layout (LoROM, HiROM, etc.) and what enhancement chips are needed to make the game work (DSP-1, SA-1, Super FX, etc.), you can read more about the internal header here.
For making sure the ROM wouldn't get corrupted, I started disassembling the file via DiztinGUIsh, the same tool used for making possible to do my SA-1 and FastROM improvement patches. However, the process quickly got pretty complex, the first thing the game does is... interacting with SA-1.
The code at $80:8000 immediately started playing with indirect pointers and interacting with SA-1 interfaces. While it's not a type of code obfuscation, it made the reverse engineering process a nightmare horror.
By persistence I got a small chunk of the cartridge disassembled to the point of concluding that modifying the ROM registration data, even if it's not blank, would not interfere with the demo. So what I did was simply setting the ROM layout to SA-1 ROM (#$23 -> SNES $FFD5), enabling SA-1 with backup RAM (#$35 -> SNES $FFD6) and setting BW-RAM size to 128 KB (#$07 -> SNES $FFD8). BAM! It started working on all emulators without issues!
SA-1 Demo 1
The first demo explores the differences between the SNES and the SA-1 CPUs. It loads a couple of sprites where you can specify. The sprites looks like a micro Kirby and a micro Blinky (that enemy from Pac-Man). By default when you launch you are on S-CPU (SNES CPU) mode and 10H (16 in decimal) micro sprites of each type are rendered on screen and they are constantly bouncing. Blinky bounces horizontally while Kirby bounces vertically; when both collides their bounce speed is affected by more or less.
When you press SELECT, you can see a darker bar around the screen. The bar represents the time the SNES CPU (or SA-1 CPU) takes to process the 32 sprites during the frame. The larger the bar, more time it spent processing. The smaller the bar, the less time it spent processing. When the bar wraps around the screen bottom, it means that the CPU could not finish processing before the frame finished rendering, this means slowdown is occurring.
As you can see on the above picture, the SNES CPU spends a good portion of the frame processing the sprite. Being more specific, the processing started at V=15 and finished at V=198; Taking the difference, it means that the SNES CPU spent 183 scanlines to process all currently visible sprites on the screen. Since there's 262 scanlines (V=1 to V=262; a new frame technically starts at V=225) per time, this means the SNES CPU has spent 69.84% of the frame time processing the sprites. Other 20.22% of the time was spent with something else. Overall CPU usage is around 90.07%, with 9.93% of free CPU resources.
Pressing START, the sprite processing is sent to C-CPU (SA-1 CPU) and this reflects on the bar size:
As you can see on the above screen, the bar is smaller compared to SNES CPU. The bar begins at V=16 and finishes at V=71; The SA-1 CPU spent 55 scanlines processing the sprites. This means that 20.99% of the time SA-1 CPU was busy processing sprites and the remaining 79.01% was idle. The SNES CPU was still being used 20.22% of the time, with 79.78% of the remaning time free.
Dividing 183 by 55 we can conclude that the SA-1 CPU has processed the sprites 3.33x (or 232.72%) faster compared to the SNES CPU. The bar sizes were measured via emulator (bsnes), but the differences should be insignificant on real hardware.
However, if we take in considering the idle time, the SA-1 CPU still had 79.01% free CPU time, compared to 9.93% when everything was being done on the SNES CPU. You had almost 8x more free processing time for the SA-1 CPU.
By using the arrow keys we can increase or decrease the amount of sprites on screen. 11H for both sprites (so 34 in total) is the maximum amount that the SNES CPU can process before getting slowdown. 23H for the blinky and 22H for the kirby (so 69 in total) is the maximum amount the SA-1 can process before getting slowdown. Roughly, the SA-1 CPU can process the double amount of sprites compared to the SNES CPU considering this demo.
It's worth noting that the sprite interaction is O(n²) time complexity. Processing 2x more sprites (+100% sprites) means 4x more processing (+300% CPU use) considering quadratic time. This is because that every Kirby has to interact with every Blinky. SNES CPU does total 256 interactions (16 Kirbies interacts with 16 Blinkies) while SA-1 does total 1,190 interactions (32 Kirbies interacts with 33 Blinkies). The processing ratio here is 4.65x, which makes sense since the average SNES CPU clock speed is 2.68 MHz and the average SA-1 CPU clock speed is 10.74 MHz (4x 2.68 MHz) for this demo.
We could even increase the ratio more by using complete parallelization, since the SA-1 stays idle for 20% of its CPU time and the SNES stays idle for 80% of its CPU time while SA-1 is running: potentially 4.30 MHz (out of 13.42 MHz) is being dropped. However, the discussion is not within the scope of this article.
Why does the demo allows up to 2CH (46) Blinky and up to 2CH (46) Kirby? That's total 2,116 interactions per frame! That requires roughly a 22 MHz CPU for dealing with all of them at once...
The main purpose of the demo is demonstrating that the same code can be used on the SNES CPU and on the SA-1 CPU without major changes but with more effective speed.
SA-1 Demo 2
The 2nd demo is focused on graphics manipulation. When you enter it, you will see a rotating Mario picture in full screen, a 64x64 rotating & scaling Bowser picture and a 64x64 rotating Donkey Kong & Diddy Kong picture. The rotation on Mario is done via Mode 7 (hardware rotation using VRAM) while Bowser and Donkey Kong is both done via SA-1 CPU (software rotation and scaling using frame buffer).
Using the arrow keys you can freely control Donkey/Diddy Kong's frame position (X and Y position). Pressing the X button pauses or unpauses the rotation and scaling animation. Holding the A button increases the zoom in/out speed while the B button decreases it.
You can do the same on Bowser's frame by pressing the same keys on Joypad #2.
Both frames does the calculations at 15 FPS. It's not a bad frame rate. When I attempted doing graphics rotation with the SA-1 chip in 2011 I got 10 FPS for a single 64x64 frame. They effectively got 30 FPS, since it's two 64x64 frames.
One of the best findings of this ROM is actually the way they does software rotation using the SA-1 chip. The initial calculations (finding the center of the rotation and calculating the initial column x/y and row x/y values) is done very wisely using SA-1 cumulative multiplication mode, a very underused SA-1 feature that does 16-bit x 16-bit multiplication with an 40-bit accumulator. This basically allows you doing several multiplications in sequence and the hardware makes sure to add each result to a single register. It's great for matrices and it's way faster than doing each multiplication individually and manually doing the additions. Adding 32-bit results using a CPU with 8-bit data bus architecture is not very effective.
Another great finding is that they used bitmap pointer as for-loop counter, by testing their bits using bit-wise operations (AND operator). This saves the need of a X and Y variable to count the loop. Considering the 65c816 has only three registers, this means a great advantage.
Instead of doing:
You can do:
The "get_pixel()" doesn't depend on the x, y or pointer variables, but rather on the currently processing matrix variables. After the horizontal (x position) loop finishes, the initial rotation is refreshed for the next scanline, but I've omitted that for simplification.
pointer & 0x3f and pointer & 0xfff actually means (pointer % 64) != 0 and (pointer % (64 * 64)) != 0, but the rotation code wisely used the bit properties to make the calculation faster. (the 65c816 is great at logical bit operations).
Surprisingly, the routine they used is not 100% optimized. The code doesn't use direct page feature that normally saves 1-2 cycles per operation, nor uses the virtual memory mapping feature the SA-1 allows for directly mapping a portion of the BW-RAM or bitmap memory to the local memory map, which also saves 1 cycle per operation. Since 64x64 graphics manipulation involves at least 40,960 operations per frame generated, saving some cycles is always welcome and maybe we could even get 60 FPS graphics manipulation speed.
The Donkey & Diddy Kong and Bowser graphics are stored internally as 256x256 8BPP linear graphics. It's a format that makes easier to be directly manipulated by the SA-1 chip and they are quite large for the SNES standards. I had to manually copy the bytes from the ROM and create a BMP image using a hexadecimal editor for exporting them into an editable format.
See how they look:
Both look dithered, possibly they had the colors reduced before getting converted to the SA-1 format.
The Mario background is actually Mode 7. Here is how the tilemap looks when the Mode 7 tilemap is rendered:
Taking a look on the internal tilemap (128x128), it looks like this:
It's not just that. If you hold X + Y buttons before entering SA-1 Demo 2, the Mode 7 Background is changed to this:
Being honest, I have no idea what is exactly this background. For some reason it makes me remind of Sonic 3 - Carnival Night Zone.
UPDATE: this is one of the backgrounds used on the BSX-BIOS,
The character data (128x128) looks like this:
BONUS: The Mysteries Behind the Demo
There is a lot of things I could not understand or looks completely weird to me. Although it's not exactly the scope of this post, I'll share them so you all can think together me and who knows, figure out the mysterious things. Here is them.
1) At the boot, the SNES CPU clears $2262 and the SA-1 CPU clears $2261. Both are unknown registers. I have no idea what they might do. There is nothing documented. Is it a secret SA-1 feature? How many these unknown registers exist? Why $2261 and $2262 and not $2260?
2) Why does the SNES CPU runs on FastROM area on a SA-1 game? And why it doesn't enable FastROM? Were they planning to use FastROM and SA-1 at the same time?
3) Why there's many random bytes on the banks that appears to be some sort of static pattern and is not used in anything? Was it to hinder the reverse engineering process? Did they make the demo on a port of another SNES ROM? Is it "uninitialized" EPROM bytes?
4) Why is there a graphics of Kirby holding a knife on the ROM? Or is it a Kirby with horns? I have no idea. There is other unused graphics too and they are loaded on the SA-1 Demo 1.
5) Why does the SA-1 CPU waits 13 cycles (5x NOP + 3 cycles until register fetch at $2307) instead of 5 cycles? Why the other multiplication waits 11 cycles? Was the multiplication circuit originally slower (but slower than the SNES CPU one)? Did the SA-1 CPU originally run faster so more wait cycles were needed (this is what happens when you use Super FX 21.5 MHz mode)?
BONUS: The 2nd ROM chip
UPDATE: the 2nd ROM chip has been confirmed to be part of Marvelous's assets!!!
There's two 4 Mbit ROM chips connected on the cartridge and apparently the 2nd one is completely unused and unrelated to the SA-1 demo. However, the ROM chip is full, I mean, FULL of graphics, palette and tilemap assets. It's from a game which I have no idea what it is. Here is some screenshots:
It appears to have 3-4 different characters. I could not find the associate palette, so guessing from the hats and the hair/face design I made one that at least...will make it possible to identify, if that's part of a released game.
It also comes with foreground and background graphics:
The perspective is not quite clear, but from a few palettes I've found on the ROM, it seems to be some sort of forest.
The graphics also come with a quite large Japanese font:
There is some HUD related graphics too:
Personally, if an amount of considerable time is invested, it might be possible to build the background and some of the graphics into actually viewable map. I'm not sure if I'm the best kind of person for doing such job, since I don't have the graphics skill for reassembling images, but if someone is interested and would like to do the process, you are more than welcome to ask for my help. Everything starts at x80000 on the SA-1 demo ROM.
This is the longest and most amazing article I've written. I am pretty happy for the all findings so far made on the SA-1 demo and I think we have a good opportunity for figuring out even more secrets, specially about the 2nd ROM chip and for the cartridge itself, which was recently shipped to Brazil and hopefully in 2-3 months it will arrive there.
If you have read so far, thank you for the time! Hope you have enjoyed these important findings and I'm sure they will contribute for my future SA-1 works, specially the ones that involves graphics manipulation. More important than that, it's the preservation of an rare SNES cartridge which shows a bit of the backstage of the SA-1 chip.
Once again, my sincere thanks for the patron who got the cartridge and made everything possible, to Bob who dumped the cartridge and shipped it to me and to Forest of Illusion for discovering the cartridge on eBay and for sharing pictures on the Twitter.