Rebuilding Windows Live Messenger, Part 1: This is not RLE
Some formats lie.
But this one… This one lied to me in the first three bytes.
52 4C 45 → "RLE"
It was not RLE.
Not byte-aligned runs, not (count, value), not even something you could
generously call RLE-like. What sits behind that header is a bit-packed
predictive codec with interleaved channels, delta predictors, a meta-decoder
layered above colour decoding, and a rendering pipeline buried under more code
than anyone should have to read for a small icon.
What you are reading is about how I went from opaque binary blobs to a bit-exact decoder, validated against the original renderer. No Microsoft code used. Only observation, instrumentation, patience, and stubbornness.
This is the first step toward a side project I was thinking about for way too long: recreating the Windows Live Messenger 2009 look-and-feel. Same visuals, entirely new engine, open, secure, cross-platform. Not nostalgia, not preservation. Reconstruction.
The most important part of this endeavour would be recreating the assets, or at least extracting them. WLM2009 had stunning visuals that made it look great, both on PCs and Macs. So the first task was set: Let’s get those assets. That couldn’t be hard, right? Right?
Where are the assets?
The UI clearly contained intricate elements: icons, frames, gradients, control visuals, but there were no images anywhere. One may expect to at least find PNGs, BMPs, and such. But no PNG, no BMP, no JPEG, nothing that looked like pixel data, nothing in the resources that resembled bitmaps. It was madness! How could that amount of transparency and effects be nowhere?
Well, let’s make a small detour to a quick history lesson: Microsoft’s internal UI frameworks.
Everything on WLM2009, and actually, in all Windows Live™ products of the same era used an internal framework called DirectUI. After some research, it seems that while people were suffering with MFC and WPF, Microsoft was using a custom UI framework to make Windows Live, Office, and as far as I understood, parts of Windows as well. That was DirectUI. People seem to think they gave up on that, but I haven’t looked deeper, sorry.
So that meant that all that fancy stuff was being drawn by something else, but as WLM also worked on Windows XP, Windows Vista, and Windows 7, binaries should provide those facilities. Awesome, but where are the assets?
Interestingly, what those binaries lack in PNGs and other image formats, they
compensate with lots of blobs containing the same three-byte magic header:
RLE – Great! Everyone (okay, not everyone) knows RLE stands for Run-Length
Encoding, so decoding that would be easy: extract the RLE blob, strip the
header, run the algorithm, and we should get some kind of image.
No. Not this time. Doing this yielded nothing but garbage. Again, wrong assumption. Now it feels like even Occam was also lying to me; a big conspiracy.
Jokes aside, one thing is clear: that was no RLE, but something else! Something internal that only Microsoft knew, and that poked that small piece of my brain that really, really likes to dig into this kind of thing – so it took the wheel. Entropy analysis, heuristics, pattern matching, anything that could give any kind of direction on what those blobs were about. Nothing.
That was the moment the situation became clear. These were not compressed images. These could only be the images. There was no fallback, no original bitmap stored somewhere else. The blob itself was the canonical representation of the artwork. But if we cannot decode it, we cannot reproduce the UI. There is no other source of truth.
Finding the real decoder
Searching for RLE on the binaries leads to a forest of misleading symbols.
Helpers, wrappers, palette code, rendering glue, dead paths, UI plumbing. Most
of those never touch pixels. They prepare, route, or post-process, but they do
not decode.
Ghidra helped map the topology, but the generated C was often fiction. Registers reused in unexpected ways confused the decompiler, calling conventions were occasionally guessed wrong, stack frames collapsed, and control flow flattened into something that looked plausible but was not what the CPU executed. The assembly was the only reliable source of truth.
So the process became mechanical: Ghidra to understand structure, x32dbg to observe reality, follow registers and observe the process’s memory, confirm assumptions against execution, and treat decompiled C as a hint rather than documentation; instead of trusting the C, the only source of truth was the generated assembly from the binaries. And that hit hard.
But no problem, nothing we can’t solve, right? Eventually one function stood
out: RLEDrawToDIB. That’s the one responsible for converting the mysterious
RLE into something observable, and even better, on a DIB (Device-Independent
Bitmap), which can be exported.
Ok, we have the entrypoint, but what now?
Building an Oracle
The main executable helped a lot finding where and what was responsible for drawing, but the amount of noise after that was just too much, after all, the main binary did way more than just rendering. x32dbg was constantly pausing execution at way too many places, and analysing the stack at those points became tiring.
The plan now was to devise an oracle. Its job is to exercise only the hot path
concerning parsing and drawing. For that, one must first find the
entrypoint – which is already done, we have RLEDrawToDIB, – and make it work
isolated from the host binary. As it may be known, dear reader, I am no Windows
developer; most of my life I used Linux, Unix, OS X, and now macOS, so this is a
whole new world. But as I usually say, in the end it’s just code, so let’s jump
in and understand how to make this work.
The idea is dumb in the best way: load uxcore.dll into a process we control,
resolve internal functions by raw offset – no exports, no symbols, just hard
RVAs pulled from Ghidra. Reconstruct just enough of the calling sequence to
drive the real renderer into a DIB section we own. Then dump every pixel.
The shim is small: a macro computes function pointers as base + offset into
the loaded binary. A handful of typedefs describe the calling conventions
Ghidra helped identify, a thin wrapper mirrors the original dispatch
logic, including the nine-slice path, and routes execution through the same
code paths the real UI would take. The renderer writes into a top-down 32bpp
DIB section, BGRA in memory, which gets converted to RGBA and dumped as raw
bytes alongside a JSON sidecar with dimensions, pixel format, and colour-source
metadata.
That raw RGBA dump is the oracle. Not the PNG, as PNGs go through GDI+ encoding and are useful for eyeballing, but byte-level comparison needs raw pixels. Every single byte my clean-room decoder produces can now be diffed against what the original renderer actually drew. No ambiguity, no interpretation, no room for “close enough.”
But the oracle was not just a validation tool, but an exploration tool. With the real renderer running under our control, we can instrument it, set breakpoints, watch which functions get called and in what order, and trace the actual execution path through those functions. Instead of staring at a forest of dead code in Ghidra trying to guess what matters, we can observe what matters. The oracle narrowed the search space from everything to exactly this.
The process ran across 516 files exported from Messenger. For each one: load the
blob, ask uxcore.dll for the frame size, allocate a DIB, clear to transparent,
render, flush, dump. A system-colour table gets serialised at the end so
palette-mode files can be validated deterministically. The whole thing is a few
hundred lines of C++ and the ugliest #define you have ever seen, but it worked.
A note aside, outputting PNG using GDI+ is ugly as ugly gets, not to mention
Windows ABI. If there’s a hell, I’ve found it. I’m also skipping a whole part
where the oracle crashed, memory was corrupted, NULL pointers were
dereferenced, and everything that could go wrong, did. Just for context, here
what I tried once it refused to work:
- Do we need to initialize COM for this process? Maybe. Would it hurt if we did
it? Hardly. So COM initialization was done. As the oracle was a “Console
Application”, in Windows lingo,
COINIT_APARTMENTTHREADEDseemed enough. That didn’t change anything. - Maybe Common Controls were missing? Did the DLL expect this to be ready?
InitCommonControlsExto the rescue. Guess what? Nothing changed. - Maybe GDI+ needed to be initialized?
GdiplusStartupwas called, and guess what? That wasn’t it either.
Only then I took the time to look at the host binary again, only to find two
interesting functions being called on uxcore.dll: UXCoreInitProcess and
UXCoreInitThread. And ed ecco! No crash.
This exposed the whole hot path the decoder used: every function could now be isolated, inspected, and observed. “But how many?” you may be asking yourself, and I’m glad you asked. From there the hot decoding path expanded into one hundred and twenty-one internal functions, excluding Win32 calls. That was the actual renderer. Now to get the gist of the format, we just need to extract all of them as both C code and assembly. The result? ~20k SLOC of C code, ~41k of Assembly. I hoped I had enough coffee in the kitchen for that and for the blessing of Our Lord Turing, cause I knew that would hurt. Not that Assembly is a problem, but after optimisations and some weird stuff we know Microsoft compilers do, I knew I was signing up for headache and frustration.
Ghidra vs reality
One may wonder why export both C and Assembly. Well, at several points, the decompiler confidently produced code that looked reasonable and also was completely wrong. And not wrong in subtle ways, but wrong in ways that obliterate decoding on the first byte. Some examples (we will get to each of these):
- Bit order: The decompiled logic strongly suggested MSB-first reading. The stream is LSB-first. That single inversion produces garbage immediately, and it is the kind of mistake that looks correct on paper until you watch actual execution and realise every bit is landing in the wrong position.
- The predictor: Decompiled output implied frequent resets. In reality, it carries across runs within a row. That persistence is exactly why gradients compress well, and why breaking it turns smooth ramps into staircases.
- Row navigation: The decompiled view made it look absolute. It is forward-relative, with row zero starting at the last index entry. Miss that, and the decoder walks into the middle of someone else’s bitstream.
- Alpha: The reconstructed C placed it inline with colour. In execution, it sits above colour decoding, as a meta-layer that controls when colour tokens are consumed. Without modelling that correctly, colour buffers desynchronise and pixels drift. This one took the longest to untangle.
- Token alignment: Loops in the pseudocode looked byte-aligned. They are not. Everything is bit-aligned. Every. Single. Field.
For those starting in reverse engineering: The lesson is simple, but important. Decompiled code is not truth. Execution is.
The Trampoline
The format remained opaque until a small piece of assembly changed everything. There was a dispatch trampoline routing execution into different meta-decoders depending on header flags. That explained why streams behaved differently even when they looked similar.
Some used palette. Some used alpha runs. Some consumed colour continuously, others stalled. Some interleaved channels, others did not. The structure finally emerged. This was a bitstream, not a byte stream. Channels decoded independently. Predictive delta replaced repetition. Alpha was not inline; it existed in a meta-layer. Rows were not sequential but navigated via a pointer table. Bits were read least-significant-bit first.
This was not RLE. This was a “small” image codec wearing an RLE badge.
The Format
With the oracle, the trampoline, and 41k lines of assembly in front of me, the format started to take shape. What follows is a high-level overview, enough to understand why calling this “RLE” is an insult to the codec. A full specification with worked examples exists, but that deserves its own post.
The Header
Every blob opens with six bytes. Three magic bytes (RLE), an image count, a
flags byte encoding the tile topology and payload size width, and a DPI byte.
Six bytes. That is the entire file header.
The tile topology is a 4-bit nibble packed into the flags byte. Each bit controls whether a fixed border exists on that edge — left, top, right, bottom. The result is a nine-patch system: a 1x1 grid when no bits are set, a full 3x3 when all four are. Sixteen possible configurations, all encoded in half a byte. After some research, I noticed Android also have those nine-patch PNGs! Same idea, different decade. Microsoft got there first. (Is that a first?)
Sub-Images
Each sub-image carries a 4-byte header dword packed with geometry and flags. Width and height, yes, but also: palette mode, 5-bit or 8-bit precision, alpha presence, and the width of the row-navigation index entries. Thirteen bits for width, twelve for height, and the rest is flags. Every bit accounted for.
The row-navigation index is where things get even weirder. Rows are not stored sequentially. A table of forward-relative offsets sits at the start of the pixel data, and row zero begins at the position of the last index entry. It is elegant once you see it, and completely opaque until you do.
The Bit-Cursor
Everything past the row index is a bitstream. Not bytes. Bits. Read LSB-first within each octet, shared across all channels in a row. The decoder maintains a bit-cursor: a byte pointer and a bit offset, and every field, from token types to run counts to delta widths, is read through it. Nothing is byte-aligned unless it happens to be by coincidence.
Tokens
Each colour token starts with a 2-bit type and a B-bit run count, where B is
derived from the image width. We have four token types:
- Solid fill: one value, repeated.
- Delta-positive: each pixel is the previous value plus a small positive offset.
- Delta-negative: same, but subtracted.
- Literal: raw values, one per pixel.
The deltas are the reason this format compresses well. Gradients, anti-aliased edges, smooth colour transitions: they all become tiny increments instead of full pixel values. The predictor carries across tokens within a row, which is what makes it work. And what makes it break if your decompiler tells you it resets. I’m looking at you, Ghidra.
The Alpha Meta-Decoder
And then there is alpha. It does not sit alongside colour: it wraps it.
A 2-bit alpha type is read before each group of pixels: transparent run, opaque run, or explicit alpha. Transparent runs emit zeros and skip colour decoding entirely. Opaque runs consume colour normally. Explicit alpha decodes an alpha token first, then pulls the corresponding colour values from the channel buffers.
This meta-layer is why the format was so hard to reverse. Colour and alpha share the same bitstream, the same cursor, but alpha controls when colour is consumed. Get that wrong and everything desynchronises. Ghidra placed it inline, and guess what? It is not inline. It is a state machine sitting above the channel interleaver.
Validation
With the format understood and a clean-room decoder written in Python (no dependencies beyond the standard library) it was time to answer the only question that matters: does it match?
The oracle produced raw RGBA dumps for 516 files extracted from Messenger’s binaries. The decoder ran against the same inputs. Byte-level comparison. No thresholds, no visual inspection, no “close enough.”
516 files. 380 single-tile, 136 multi-tile nine-slice. 516 exact matches. Not a single byte off.
What’s Next
This post covered the archaeology: finding the format, building the tooling, understanding the structure. The next one will go deeper: a full worked decode of a real file, bit by bit, token by token, with annotated hex dumps and the complete specification. If you enjoy reading binary formats the way some people enjoy crossword puzzles, you will like it.
There is also the matter of the DPI scaling pipeline and the palette system, both of which deserve their own attention. And eventually, the actual goal: using all of this to reconstruct the Windows Live Messenger 2009 interface, from scratch, on modern platforms.
But that is for another day.