[[/home](/site/)] [[/projects](/site/projects.html)] [[/whoami](/site/about.html)] [[/privacy](/site/privacy.html)]
published by Daniel Ludwig on 2018/07/24 15:59
In a previous job, one of my fellow co-workers coined the phrase "pain-driven development". One description I found online says:
> "... waiting to apply [...] practices to your code
> until there is some pain the current approach is
> causing that must be addressed."
>
> _source: https://deviq.com/pain-driven-development_
I've never heard this term before that day, so I don't know if he came up with it by himself, but it surely was fitting at the time and place.
Interestingly, over the years in games development I met many people who are notably resistant to this kind of pain. But sometimes, it's worth pushing that direction even if the pain seems ... tolerable.
## "Gestatten, Heinrich!"

This is Heinrich. He's a German soldier in our upcoming game, [Pathway][pathway]. He's also a smoker, and a sprite.

As a matter of fact, in Pathway, unlike many other pixel art games, Heinrich has two siblings, Heinrich#Normal and Heinrich#Depth. Weird names, I know. They _look_ weird, too. They follow each of Heinrich's steps. One is responsible for making Heinrich look darn handsome when lighted. The other one helps to cast Heinrich's shadow in a plausible manner.
## Pixels! Millions of them!
In Pathway, there are many sprites like Heinrich. Many thousands, in fact. For use in the game, they are all gathered in large sprite sheets, which is a common technique to ensure good render performance, as well as to reduce load times.
In [libGDX][libgdx] terms, sprite sheets are grouped into what's called a _texture atlas_. The tool used to build this atlas is the [_TexturePacker_][texturepacker].
 [Robotality][robotality]'s previous game, [Halfway][halfway], shipped with an atlas of 11 2048x2048 pages, which approximates about 2 1/2 sheets of size 4096x4096. At the moment I write this article, Pathway _quadruples_ this size already. ## Development and runtime costs Building texture atlases of this size is a time-consuming process, as well as loading them. Early in development of the game, we took some measures to only rebuild assets when needed. But nowadays, _if_ a rebuild is required, it surely takes its time. This is how it looks today, on an i7-4790K with a decent SSD: ~~~ txt packing sprites.atlas ... packing anims.atlas ... asset builder executed in 36.162198 seconds ~~~ With a cold file cache, this can take twice as long. Run it on a HDD, and you can go enjoy your coffee break. Now, _loading_ sprite sheets is a lot faster of course. This is done during startup of the game. After we applied some optimizations to the game loading process during the past few months, it's one of our biggest load-time performance hogs right now. ~~~ txt load sprites/anims atlas - action executed in 1887 ms ~~~ ## Profiling mode: enabled Recently, I sat down again to figure out how to speed up both updating and loading our texture atlases. Surprisingly, my initial assumptions - for example, that the _TexturePacker_ code by itself is slow, because it does _a ton_ of stuff we don't need - was proven wrong. - Searching directory trees and collecting all source files is slow, especially on a Windows file system. Also, it doesn't help that we do this twice: first for checking if something has changed and the atlas needs to be rebuilt, then again with _TexturePacker_ itself. Still, this wasn't the biggest offender by far. - _Reading and decoding_ all the tiny PNG files to _fill_ the atlas with pixel data __is slow__. Like, really __sloooooooow__. In fact, the profiler showed that _TexturePacker_ spends almost all of its time in image I/O. Which leads to ... - ... encoding and writing atlas pages. They are saved as PNG files, which compress super nice, but __devour__ CPU time. This is the single most expensive _TexturePacker_ operation. - To some extend, the same applies to _loading_ the texture atlas. According to the profiler, most time here is spent reading and decoding PNG data. A few weeks ago, _Stupid Past Me_ went ahead and replaced the libGDX PNG decoder with [stb_image][stb] via LWJGL bindings, just to figure out that this change didn't improve anything. Then, _Disappointed But Less Stupid Me_ went into the platform code of libGDX to discover that it _already uses_ stb_image for loading PNG files. __Doh!__ Some days later, _Now Slightly More Educated Me_ had a closer look at stb_image itself. I decided that not only it is unlikely I can get PNG encoding/decoding done any faster than stb_image and stb_image_write, but that using the PNG format also leans heavily on trading speed (which we want) for compression ratio (which we don't care _that much_ about), at the extra cost of some pretty hefty memory overhead - stb_image roughly uses [size-of-compressed-data + size-of-uncompressed-data + some more] as temporary buffers while decoding. ## Full speed ahead! I'll spare some details now. In a nutshell, I sat down to write a replacement for the libGDX _TexturePacker_, as well as a custom pixmap/texture loader: - It's written in C99, mostly. There are a few reasons for doing that, but foremost it's been a fun experience. Well, minus string operations for working with file paths, maybe. - I didn't bother to study how _TexturePacker_ builds the atlas layout. My implementation works _relatively_ straightforward by building a binary tree, splitting regions left/right or top/bottom, using width or height of the input image as split distance. Images are sorted by size beforehand, so large images are inserted first. It misses many optional features _TexturePacker_ ships with. There's some space wasted, so the result isn't optimal, but it works well enough for our needs. - An .atlas file is written, in the text format libGDX understands. - Atlas pages are compressed and saved concurrently. This had a huge impact when writing them in PNG format, though this performance gain was mitigated shortly after, because ... - ... atlas pages are now written in a custom, _very simple_ binary format, and compressed using [LZ4][lz4]. I found LZ4 to have some pretty cool characteristics: - It compresses very fast, and decompresses _blazingly fast_. - Compression ratio is pretty nice. The images are only about twice the size of their PNG counterparts, give or take. - LZ4 itself is free of heap allocations. It just uses a few kilobytes of stack memory. - Everything is wrapped as a native shared library, with a small JNI interface on top to run an atlas build, and to load images into existing _Pixmap_ buffers. This means the caller needs to know the image size upfront, which is trivial - the size is written down in the .atlas file. ## No pain, no gain Now, let's study the numbers! This is on the same PC, packing the same set of assets, with a hot file cache, and some statistics added to the mix: ~~~ txt packing sprites ... 1290 input files to sort - search and collect files: 0.124 seconds - build atlas layout: 0.020 seconds - fill atlas with image data: 0.204 seconds - write atlas pages to disk: 0.075 seconds - write libGDX .atlas format: 0.006 seconds packing anims ... 3523 input files to sort - search and collect files: 0.353 seconds - build atlas layout: 0.236 seconds - fill atlas with image data: 1.087 seconds - write atlas pages to disk: 0.293 seconds - write libGDX .atlas format: 0.014 seconds 'asset builder (mode: SPRITES)' executed in 5.640639 seconds ~~~ [Simon][simon], who suffers the most from slow atlas build times, because he needs to rebuild them all the time, almost fell off his chair when I sent him these numbers. Math whiz kids may note that the numbers don't add up. That's because the 5.64 seconds still include our pre-pass to check for file modifications. _Loading_ atlas pages now takes much less time, too: ~~~ txt load sprites/anims atlas - action executed in 800 ms ~~~ It's purely I/O bound now - to gain any more speed, I'd probably have to process our source assets differently, and condense the actual _pixel data_. ## That's all for now? Of course, I've got some ideas already for further improvements: - As said above, condense the actual pixel data. Right now, we are pretty wasteful here. Pixel maps are saved as 32-bit RGBA and color-indexed at load time - for a couple of reasons. Normal and depth maps are lavishly saved as 32-bit images, too. Changing this would increase atlas build time again to some extend, but result in fewer (or smaller) images, which would reduce load time. - For the atlas creation part, we should be able to get rid of our Java-side file modification checks. I would need to profile this part specifically, but my gut feeling is that the _"search and collect files"_ pass above is already fast enough to take over. (Another gut feeling of mine is that directory and file operations are a lot faster in native code, but again, I don't have profile data yet to confirm.) [halfway]: https://halfwaygame.com/ [libgdx]: https://libgdx.badlogicgames.com/ [lz4]: https://lz4.github.io/lz4/ [pathway]: http://pathway-game.com/ [robotality]: http://robotality.com/ [simon]: https://twitter.com/sibachmann [stb]: https://github.com/nothings/stb [texturepacker]: https://github.com/libgdx/libgdx/wiki/Texture-packer