KHAN.NES screaming demo

Oct 13, 2019

I made a demo based on a strange GIF of a stack of Captain Kirk's face(s) doing his famous "Khan!" scream.

I did this after a fellow developer Gradual Games posted the GIF and dared anybody to make it happen on NES. After thinking about it, I came up with an idea that I knew could work, so I accepted the challenge.

Maybe this is also an oblique tribute to Kevin Hanley, who has been making NES games under the KHAN name for many years.

Vertical Parallax

On retro systems there were a lot games with horizontal parallax effects, like the layered backgrounds of Shadow of the Beast, or Shatterhand, or the battle against Zeromus, or countless other examples. This was easier to do on 16-bit consoles, but many games managed to pull it off on the NES.

Vertical parallax, on the other hand, was exceedingly rare. Horizontal split effects work because the screen is drawn line by line, and by waiting for the right time, you just change the scroll position at the right line, creating a split between what's above and below. The equivalent vertical operation would have to make a series of scroll splits between pixels on every line. The NES just isn't capable of this, as the CPU isn't fast enough to slip in between individual pixels.

So vertical splits were very rare, but not impossible. I listed a few examples in my article on NES scrolling. The most impressive is probably Crisis Force, though it was a Japan-only release. The Wookie Hole of Battletoads might be more familiar. I used a similar technique in Lizard for Giant Frog Boss.

Prior Techniques

At this point the description is going to require some more technical background on the NES, so I may have to apologize if I lose a few readers here. The scrolling article I mentioned above covers some of the prerequisites.

The main problem here is that updating the background layer of the NES has low bandwidth. You can put sprites anywhere you want, but they only cover a small part of the screen. The background tiles have to do most of the work here, but you can only change part of the screen for each frame.

The examples I mentioned above all get around this by animating the tiles. You can call this CHR animation, if you're familiar with that term. The basic idea is that because a background is made up of repeated tiles, if you can change the tile itself, that change will appear in many places across the background at once.

There's two variations of that: CHR banking, and CHR RAM.

With banking, you make many copies of the tile in your graphics ROM, and use cartridge mapper hardware to switch between them like a flipbook. However, this has the big drawback that it takes up a lot of ROM space.

With RAM, instead of taking up ROM space, you rewrite shifted versions of the needed tiles to RAM every frame. This doesn't require special mapper hardware, but it shares the bandwidth problem with the background. The big drawback here is that it's hard to update very many tiles at once. The other drawback, which is much less relevant today, is that RAM used to be more expensive than ROM for this purpose.

However, neither of these variations can create many independently scrolling columns like this demo. You only get one version of a tile at a time, so they all have to move in lock-step with each other.

KHAN.NES technique

Instead of trying to use mapper banking hardware, or rewriting tiles, I decided to attack a different part of the problem.

You can't update much of the background each frame, which makes it hard to change much of the screen at the same time. However, if the image you want is vertically periodic (i.e. it repeats itself) then you can use scroll splits to create that repetition after the correct number of lines!

If your vertical period is small enough, you can restrict it to a size that fits your background update bandwidth, and this smaller portion can be repeated to fill the whole screen.

Finally, now that the bandwidth problem is solved, you need a set of tiles the can represent the image you want to scroll in all possible rotations. It's not that bad, because you only need one tile for each row of 8 pixels in the image. For a 16 x 16 image, to represent all 16 possible rotations takes 32 tiles (8 x 8 each).

The end result is that you can have independent columns. Using an IRQ technique to handle the scroll split repetitions, this could be a pretty reasonable technique to use in a game. Carefully applied, it doesn't have to take much CPU, and isn't too heavy on the ROM budget or hardware requirements. This demo uses the smallest possible NES ROM size, and no hardware extensions.


Now a few extra details about this specific demo...

If it wasn't just a demo, it would have been more reasonable to restrict the column sizes a little more. Here I have 8, 16, 24, and 32 pixel columns all at once. Under normal circumstances, you can update about 4 rows of background in a single frame, and if I'd left out the 24 pixel variation this would be just enough to fit them all. However, including it gave me a much larger period of 96 pixels (12 rows). This is because the period ends up being the least common multiple.

A compromise was needed to handle 12 rows at once. One possibility would have cut the framerate to 20 Hz instead of 60 Hz and just spread it out over 3 frames, but I wanted to keep it smooth. Instead, I added those letterbox style black bars at the top and bottom. Here I disabled the NES' GPU for a few extra lines in a technique known as forced blanking, giving myself more time to upload changes during the blank. If you've ever noticed that the Snake Pit level of Battletoads has a lowered status bar compared to other levels, this is why: more bandwidth for updating those snakes.

The second black bar on the bottom is just for symmetry, though. The GPU isn't disabled there. Between the long period and the letterboxing, it only actually has to do one scroll split for repetition, but I think a more "normal" version of this would have to split several times.

This demo also uses CPU cycle timed code to time the horizontal scroll splits as well. That wouldn't work so well in a game situation, because timing things with cycle counting instead of with an IRQ timer tends to eat up processing power, but for the purpose of this demo it was sufficient. (As an aside: the Mesen emulator has a fantastic "event viewer" feature that makes figuring out these timings a lot easier than it had ever been before.)

Finally the DPCM scream sample throws a little bit of a wrench into things. Playing a sound sample uses a few CPU cycles periodically, and that has to be accounted for in the timing. For this reason, it's actually playing a "silent" sample whenever the scream isn't happening just so it keeps having a consistent effect on timing. It also means that the DPCM IRQ wasn't really available here, which without a mapper device in the cartridge to provide one is really the only IRQ timer available on the NES.

So... while the demo here isn't quite doing the same thing I would want to do if using this in a real game situation, I hope these digressions don't get in the way of understanding the example too much.

Source Code

Full source code of this demo is available at github: bbbradsmith/khanes

If you're curious about the technique I'd also recommend using a debugging emulator like Mesen or FCEUX and take a peek behind the scenes with their PPU viewer capabilities.

By becoming a member, you'll instantly unlock access to 1 exclusive post
By becoming a member, you'll instantly unlock access to 1 exclusive post