Most commonly a sprite is represented as a 2d array of pixels that you for X, for Y over and use math or branching to blend on to the screen. But, that's a lot of reading and writing, and a lot of the math ends up doing nothing because a lot of the pixels are intentionally invisible.
So, you could do some sort of visible/invisible RLE to skip over the invisible pixels. That's better, but it's still a complicated loop and still a lot of reading pixels.
So, many crazy democoders have decided to write "sprite compilers" that read the 2D color array of a sprite and spits out the assembly for the exact instructions needed to write each visible pixel one at a time as a linear instruction sequence with no branching. The sprites are then assembled and linked into the program code as individual functions. I believe they can even exclusively use immediate values encoded inside the instructions rather than reading the colors from a separate memory address. So, rather than read instruction, read data, write data; it becomes a read instruction, write data in two straight lines in memory.
IIRC, because of relative-address store instructions, the destination address does not have to be hard-coded. So, the sprites can still move around dynamically.
What's harder is clipping against the sides of the screen. With no branching, there's no way to prevent the sprite from writing past the end of a line/screen (wrapping/mem-stomping). So, there does need to be a test per sprite to detect that case and fall back on a more complicated blitter.
What's harder is clipping against the sides of the screen. With no branching, there's no way to prevent the sprite from writing past the end of a line/screen (wrapping/mem-stomping). So, there does need to be a test per sprite to detect that case and fall back on a more complicated blitter.
In fullscreen modes, you could also just make your screen buffer bigger than the actual screen by the width and height of your largest sprite.
In my game engine I wrote for the Apple IIgs a long time ago, I used compiled sprites and maintained a 1-scanline-wide mask that I used to clip the compiled sprite to the screen edge.
This only cost one extra AND instruction and allowed the sprites to be clipped to any size rectangular playfield while still maintaining almost all of the speed benefits.
That's actually cool, as it would also allow clipping against a "foreground" by varying the address of the scanline mask. E.g. imagine foreground trees in a jungle scene.
I did extend it to use a full-screen foreground mask that implemented this sort of clipping. I was able to make the mask scrollable which allowed the compiled sprites to appear "behind" fences and other complex shapes with per-pixel accuracy.
It could even be used to mask out individual pixel bits that allowed for fake "lighting" changes with a carefully chosen palette.
I keep wanting to do a "retro-game" and make use of what I've learned about these types of effects now. Despite how far machines like the C64 for example were pushed, I don't think they were pushed nearly as far in terms of games as with demos and it'd be fascinating to try to push the limits..
It certainly can be (and has been) used for games where a limited set of sprites are drawn - each frame of animation is just a separate compiled sprite routine. Compiled sprite routines write at video memory addresses relative to a specified position, so they can be moved around at will.
If you create code for each sprite and only change the base address to write to, you can use it for games alright. Jazz Jackrabbit is one that I've seen mentioned using compiled sprites. Lots of DOS games basically had to.
It could probably be used for games as long as you keep within the bounds of your pixel data.
However, modern games use GPU acceleration instead of plotting the pixels with the CPU, and most higher languages don't expose the sort of functionality you need to use this trick in the first place.
Even the very first digital video game (Spacewar!) used something very much alike (Dan Edwards' outline compiler). Movable and rotatable – think of advancing by unit vectors –, compiled just in time from directional encodings.
Read more at: http://www.masswerk.at/spacewar/inside/insidespacewar-pt4-oc...
The comments in the Wolf3D source implied that the self-compiling raycasting wasn't faster on an 80286 than a conventional BSP tree would have been, and was in fact slower on an 80486 thanks to invalidating the code cache over and over.
On a modern pipelined CPU with separate instruction/data caches self-modifying code does have a rather large penalty since it has to flush the pipeline and the caches, but on the original 8088 PC which has no cache and no pipeline (there's only a 4-byte prefetch queue), the penalty is much smaller.
Most commonly a sprite is represented as a 2d array of pixels that you for X, for Y over and use math or branching to blend on to the screen. But, that's a lot of reading and writing, and a lot of the math ends up doing nothing because a lot of the pixels are intentionally invisible.
So, you could do some sort of visible/invisible RLE to skip over the invisible pixels. That's better, but it's still a complicated loop and still a lot of reading pixels.
So, many crazy democoders have decided to write "sprite compilers" that read the 2D color array of a sprite and spits out the assembly for the exact instructions needed to write each visible pixel one at a time as a linear instruction sequence with no branching. The sprites are then assembled and linked into the program code as individual functions. I believe they can even exclusively use immediate values encoded inside the instructions rather than reading the colors from a separate memory address. So, rather than read instruction, read data, write data; it becomes a read instruction, write data in two straight lines in memory.