Lightmap Compression in The Witness

screenshot06
In my initial implementation of our lightmapping technology I simply stored lightmap textures in RGBA16F format. This produced excellent results, but at a very high memory cost. I later switched to the R10G10B10A2 fixed point format to reduce the memory footprint of our lightmaps, but that introduced some quantization artifacts. At first glance it seemed that we would need more than 10 bits per component in order to have smooth gradients!

At the time the RGBM color transform seemed to be a popular way to encode lightmaps. I gave that a try and the results weren’t perfect, but it was a clear improvement and I could already think of several ways of improving the encoder. Over time I tested some of these ideas and managed to improve the quality significantly and also reduce the size of the lightmap data. In this post I’ll describe some of these ideas and support them with examples showing my results.

I believe the RGBM transform was first proposed by Capcom in these CEDEC 2006 slides. While Capcom employs it for diffuse textures, it has become a popular way to encode lightmaps. RGBM or some of its variations are used in Unity, Bioshock Infinite, and the Unreal Engine, among others. Its use for standard color textures is not as widespread, but Shane Calimlim found it to be a good fit for the stylized artwork of Duck Tales and suggests it could be a good format in general. However, with so many precedents, I was surprised it had not been analyzed in more detail.

The main challenge of compressing lightmaps is that often they have a wider range than regular diffuse textures. This range is not as large as in typical HDR textures, but it’s large enough that using regular LDR formats results in obvious quantization artifacts. Lightmaps don’t usually have high frequency details, they are often close to greyscale, and only have smooth variations in the chrominance.

In our case, most our lightmap values are within the [0, 16] range, and in the rare occasions when they are outside of that range, we constrain them clamping the colors while preserving the hue to avoid saturation artifacts. Brian Karis also suggests tone mapping the upper section of the range to avoid sharp discontinuities, but I only found this to be a problem when light sources had unreasonably high intensity values.

The shape of the lightmap color distribution varies considerably. Interior lightmaps are predominantly dark with a long tail of brighter highlights:

tunnel.histogram

while outdoor lightmaps have a more Gaussian distribution with a bell-like shape. This particular lightmap is under the shade of some colored fall trees, which give it an orange tone:

hut.histogram

Not all lightmaps use all the available range, so after tone mapping the next thing we do is to scale the range to [0, 1].

So, why is RGBM a good choice for data like this? The distribution of distinct values that can be represented with RGBM looks as follows:

rgbm.histogram

It provides much more precision toward 0 than toward 1. This is beneficial for images that are intended to be visualized at multiple exposures. We want to obtain smooth lightmaps without quantization artifacts independently of the camera exposure. However, as we will see later, this provides much more precision around 0 than is actually necessary.

Naive RGBM Encoding

In my initial implementation I simply used RGBA8 textures, squaring the colors to perform gamma correction in the shader. The standard rgb -> RGBM transform is as follows:

m = max(r,g,b)
R = r/m
G = g/m
B = b/m
M = m

A simple improvement I did early on is to divide the quantization interval in two. This is a variation of the idea presented in Microsoft’s LUVW HDR texture paper, but instead of using an extra texture, I simply rely on the RGB and alpha (M) channels.

A similar observation is done by Shane Calimlim:

Gray is encoded as pure white in the color map, which may not always be optimal. Gray is an edge case most of the time, but a smarter encoding algorithm could make vast improvements in its handling. In the simple version of the algorithm the entire burden of representing gray lies with the multiply map; this could be split between both maps, improving precision greatly in scenarios where the color map can accommodate extra data without loss.

But in our case grey is not really an edge case! Lightmaps are mostly grey with slight smooth color variations.

The way I implemented this is by choosing a certain threshold t. For values of m that are lower than t the color is fully encoded using only the RGB components as follows:

R = r/t
G = g/t
B = b/t
M = 0

and for values of m greater than the threshold t, the normalized color is encoded in the RGB components, and the normalization factor m is biased and scaled to store it at a higher precision:

R = r/m
G = g/m
B = b/m
M = (m-t) / (1-t)

That’s equivalent to just doing:

m = max(r,g,b,t)
R = r/m
G = g/m
B = b/m
M = (m-t) / (1-t)

This is useful for several reasons. As Shane notes, splitting the burden of representing the luminance between the RGB and M maps we can obtain more precision and reduce the size of the quantization interval.

It’s important to note that this actually reduces precision around zero, where we don’t actually need so much, because the game camera never has long enough exposures. If we look at the distribution of grey levels that biased RGBM can represent it now looks as follows:

rgbmt.histogram

Picking different values of t allows us to use different quantization intervals for different parts of the color range. The optimal choice of t depends on the distribution of colors in the lightmap and the number of bits used to represent each of the components. We chose this value experimentally. For our lightmaps values around 0.3 seemed to work best when encoding them in RGBA8 format.

Optimized RGBM Encoding

With these improvements RGBM was already producing very good results. Visually I could not see any difference between the RGBM lightmaps and the raw half floating point lightmaps. However, I had not reduced the size of the lightmaps by much and ideally I wanted to compress them further.

The next thing that I tried to do was to choose M in a way that minimizes the quantization error. I did that by brute force, trying all possible values of M, computing the corresponding RGB values for that choice of M, and selecting the one that minimized the MSE:

for (int m = 0; m < 256; m++) {
    // Decode M
    float M = float(m) / 255.0f * (1 - threshold) + threshold;

    // Encode RGB.
    int R = ftoi_round(255.0f * saturate(r/ M));
    int G = ftoi_round(255.0f * saturate(g / M));
    int B = ftoi_round(255.0f * saturate(b / M));

    // Decode RGB.
    float dr = (float(R) / 255.0f) * M;
    float dg = (float(G) / 255.0f) * M;
    float db = (float(B) / 255.0f) * M;

    // Measure error.
    float error = square(r-dr) + square(g-dg) + square(b-db);

    if (error < bestError) {
        bestError = error;
        bestM = M;
    }
}

This improved the error substantially, but it introduced interpolation artifacts. The RGBM encoding is not linear, so interpolation of RGBM colors is not correct. With the naive method this was not a big deal, because adjacent texels usually had similar values of M, but the M values resulting from this optimization procedure were not necessarily similar anymore.

However, it was easy to solve this problem by constraining the search to a small range around the M value selected with the naive method:

float M = max(max(R, G), max(B, threshold));
int iM = ftoi_ceil((M - threshold) / (1 - threshold) * 255.0f);

for (int m = max(iM-16, 0); m < min(iM+16, 256); m++) {
    ...
}

This constrain did not reduce the quality noticeably, but eliminated the interpolation artifacts entirely.

While this idea showed that there's a significant optimization potential over the naive approach, it did not get us any closer to our stated goal: to reduce the size of the lightmaps. I tried to use a packed pixel format such as RGBA4, but even with the optimized encoding, it did not produce sufficiently high quality results. To reduce the size further we would have to use DXT block compression.

RGBM-DXT5

Simply compressing the RGBM data produced poor results and compressing the optimized RGBM data did not help, but instead only degraded the results even more.

A brute force compressor is not practical in this case, because when processing blocks of 4x4 colors simultaneously the search space is much larger.

A better approach is to first compress the RGB values obtained through the naive procedure using a standard DXT1 compressor and then choosing the M values to compensate for the quantization and compression errors of the DXT1 component.

That is, we want to compute M so that:

M * (R, G, B) == (r, g, b)

This gives us three equations that we can minimize in the least squares sense. The M that minimizes the error is:

M = dot(rgb, RGB) / dot(RGB, RGB)

In my tests, the resulting M's compress very well in the alpha map and reduced the error significantly.

I also tried to encode RGB again with the newly obtained M, and compress them afterward, but in most cases that did not improve the error. Something that worked well was to simply weight the RGB error by M in the initial compression step.

The number of bits allocated for the RGB and M components is very different than in our initial RGBA8 texture, so the choice of t had to be reviewed. In this case values of t around 0.15 produced best results. I attribute this to the reduced number of bits per pixel used to encode the RGB channels.

Results

In addition to the described formats I also compared the proposed method against BC6. BC6 is specifically designed to encode HDR textures, but it's not available in all hardware. Our optimized RGBM-DXT5 scheme provides nearly the same quality as BC6:

The above chart is displaying RMSE values of the final images after color space conversion and range rescaling.

To study the effectiveness of the encoders it's more useful to look at the errors before rescaling. These look a lot more uniform, but cannot be compared against BC6 anymore, since in that case adjusting the range of the input values does not usually reduce the compression error.

Finally, I thought it would be interesting to use RGBM-DXT5 to compress standard images and compare it against YCoCg-DXT5. The following chart shows the results for the first 8 images of the kodim image set:

YCoCg-DXT5 is clearly a much better choice for LDR color textures.

Conclusions and Future Work

Our proposed RGBM encoder was good enough for our lightmaps, but I'm convinced there's more room for improvement.

One idea would be to pick a different threshold t for each texture. Finding the best t for a given texture to be encoded using the plain RGBM linear format would be easy, but it's not so obvious when using block compression.

The RGB components are encoded with a standard weighted DXT1 compressor. It would be interesting to use a specialized compressor that favored RGB values with errors that the M component could correct. For example, the M values resulting from the least squares minimization are sometimes above 1, but need to be clamped to the [0, 1] range, it should be possible to constrain the RGB endpoints to prevent that. It may also be possible to choose RGB endpoints such that the error of the least squares fitted M are as small as possible.

Finally, DXT5 is not available on most mobile GPUs. I haven't tried this yet, but it seems the ETC2 EAC_RGBA8 format is widely available and would be a good fit for the techniques presented here. It would also be interesting to compare our method against packed floating point formats such as (R11G11B10_FLOAT
R9G9B9E5_SHAREDEXP) and ASTC's HDR mode.

Tables

In all cases I measured the error using the RMSE metric, which is the same metric used to guide the block compressors. It may make more sense to use a metric that takes into account how the lightmaps are visualized in the game. I did exactly that, tone map the lightmaps at different exposures and compute the error in post-tone-mapping space. The tables below show the resulting values and they roughly correlate with the plain RMSE metric.

               Tone mapped error
            e=2.2   e=1.0   e=0.22     average   rmse

RGBM8 naive:

hallway:    0.00026 0.00045 0.00089 -> 0.00053 | 0.00007
hut:        0.00100 0.00102 0.00082 -> 0.00095 | 0.00609
archway:    0.00114 0.00141 0.00190 -> 0.00148 | 0.00818
windmill:   0.00102 0.00133 0.00185 -> 0.00140 | 0.00083
shaft:      0.00201 0.00228 0.00214 -> 0.00214 | 0.00798
hub:        0.00151 0.00182 0.00191 -> 0.00175 | 0.00267
tower:      0.00153 0.00200 0.00299 -> 0.00217 | 0.00160
tunnel:     0.00094 0.00123 0.00171 -> 0.00129 | 0.00093
mine:       0.00105 0.00120 0.00141 -> 0.00122 | 0.00640
theater:    0.00099 0.00126 0.00160 -> 0.00128 | 0.00129

RGBM8 optimized:

hallway     0.00010 0.00015 0.00030 -> 0.00018 | 0.00004
hut         0.00049 0.00043 0.00031 -> 0.00041 | 0.00543
archway     0.00044 0.00060 0.00122 -> 0.00075 | 0.00595
windmill    0.00020 0.00026 0.00036 -> 0.00027 | 0.00024
shaft       0.00059 0.00066 0.00102 -> 0.00076 | 0.00501
hub         0.00038 0.00051 0.00085 -> 0.00058 | 0.00099
tower       0.00060 0.00072 0.00082 -> 0.00072 | 0.00112
tunnel      0.00025 0.00031 0.00042 -> 0.00033 | 0.00048
mine:       0.00044 0.00049 0.00083 -> 0.00058 | 0.00467
theater:    0.00061 0.00076 0.00087 -> 0.00075 | 0.00095

RGBM4 optimized:

hallway:    0.00169 0.00259 0.00562 -> 0.00330 | 0.00063
hut:        0.00932 0.00899 0.00773 -> 0.00868 | 0.08317
archway:    0.00906 0.01287 0.02616 -> 0.01603 | 0.09614
windmill:   0.00424 0.00562 0.00830 -> 0.00606 | 0.00402
shaft:      0.01103 0.01314 0.01978 -> 0.01465 | 0.08204
hub:        0.00868 0.01160 0.01848 -> 0.01292 | 0.01722
tower:      0.01004 0.01217 0.01466 -> 0.01229 | 0.01835
tunnel:     0.00516 0.00687 0.01066 -> 0.00757 | 0.00764
mine:       0.00871 0.01044 0.01742 -> 0.01219 | 0.07510
theater:    0.00683 0.00840 0.00963 -> 0.00829 | 0.01057

DXT5 naive:

hallway:    0.00155 0.00249 0.00570 -> 0.00325 | 0.00048
hut:        0.00487 0.00536 0.00564 -> 0.00529 | 0.02119
archway:    0.00500 0.00656 0.01039 -> 0.00731 | 0.01949
windmill:   0.00214 0.00287 0.00444 -> 0.00315 | 0.00177
shaft:      0.01062 0.01339 0.01977 -> 0.01459 | 0.03412
hub:        0.00616 0.00796 0.01130 -> 0.00848 | 0.01481
tower:      0.00551 0.00712 0.01019 -> 0.00761 | 0.00735
tunnel:     0.00235 0.00308 0.00451 -> 0.00331 | 0.00285
mine:       0.00471 0.00589 0.00877 -> 0.00646 | 0.01809
theater:    0.00332 0.00412 0.00496 -> 0.00413 | 0.00498

DXT5 optimized:

hallway:    0.00125 0.00199 0.00456 -> 0.00260 | 0.00041
hut:        0.00336 0.00373 0.00408 -> 0.00372 | 0.01529
archway:    0.00353 0.00460 0.00719 -> 0.00511 | 0.01285
windmill:   0.00134 0.00180 0.00280 -> 0.00198 | 0.00116
shaft:      0.00801 0.01016 0.01507 -> 0.01108 | 0.02437
hub:        0.00469 0.00602 0.00846 -> 0.00639 | 0.01241
tower:      0.00421 0.00544 0.00781 -> 0.00582 | 0.00599
tunnel:     0.00157 0.00206 0.00306 -> 0.00223 | 0.00193
mine:       0.00338 0.00428 0.00646 -> 0.00471 | 0.01178
theater:    0.00245 0.00302 0.00357 -> 0.00301 | 0.00382

DXT5 optimized with M-weighted RGB:

hallway:    0.00114 0.00184 0.00430 -> 0.00243 | 0.00038
hut:        0.00338 0.00382 0.00443 -> 0.00388 | 0.01478
archway:    0.00356 0.00464 0.00725 -> 0.00515 | 0.01271
windmill:   0.00134 0.00180 0.00281 -> 0.00198 | 0.00113
shaft:      0.00804 0.01023 0.01522 -> 0.01116 | 0.02382
hub:        0.00472 0.00611 0.00868 -> 0.00650 | 0.01088
tower:      0.00421 0.00544 0.00787 -> 0.00584 | 0.00597
tunnel:     0.00157 0.00206 0.00306 -> 0.00223 | 0.00193
mine:       0.00337 0.00428 0.00648 -> 0.00471 | 0.01170
theater:    0.00245 0.00302 0.00356 -> 0.00301 | 0.00382

7 Comments

Krzysztof Narkowicz says:

October 1, 2016 at 1:00 pm

Very interesting article. Did you try to go further this path and store RGB in a half res BC1 texture and luminance in a full res BC4 texture? BTW a classic trick, which I guess you are familiar with, is to encode some lightmap range per object. Not sure if it’s applicable here, as it heavily depends on the content.

1. Ignacio Castaño says:
  
  October 3, 2016 at 8:36 am
  
  Thanks for the suggestions! I already use a scale factor per object, and it does help a lot (that’s what I meant by scaling the range to [0, 1] in the article). It would be interesting to try the optimized M-map technique with down-sampled chrominance. In our case it wasn’t very practical to increase the number of texture samplers, but I think the main problem would be that the alignment of the charts would have to be doubled and that would increase the amount of wasted space.
  
Galen Alderson says:

October 6, 2016 at 2:01 am

I don’t understand the section which states:

“That’s is equivalent to just doing:

m = max(r,g,b,t)
R = r/m
G = g/m
B = b/m
M = (m-t) / (1-t)”

Wouldn’t this just result in any color value less than the threshold being encoded with an M value of 0, thus appearing as full black, rather than being encoded with the expected M value of 1 from the previous section? In that case it seems the math wouldn’t be equivalent to the two blocks above it.

1. Ignacio Castaño says:
  
  October 7, 2016 at 9:10 am
  With the biased representation RGBM colors are decoded as:
```
m = M * (1-t) + t
r = R * m
g = G * m
b = B * m
```
  rgb values lower than the threshold result in m set to t, and therefore M is set to 0. I’ve just corrected the typo in the previous section. The idea is that the M factor is fixed and the color is fully represented with the rgb components.
Robin Green says:

October 14, 2016 at 10:47 am

Have you seen the Rec.2020 PQ curve? It converts an HDR image in the range 0.005 to 10,000 nits into a 10-bit code that is provably below the threshold of banding. It’s got perceptual and color science behind it’s design and it’s simple to code.

1. Ignacio Castaño says:
  
  October 17, 2016 at 9:53 am
  
  I have used the PQ curve in our color grading pipeline for HDRTV, but I haven’t studied it in detail. It’s certainly useful to do some range transforms for lightmap encoding, in fact, we do not rely on the non-uniformity of the RGBM distribution only, but also square the colors, as in a cheap sRGB approximation. I’m sure it should be possible to find a better curve than that, but I doubt it would be exactly PQ, because we have a lower range and the RGBM quantization is not uniform.
  
Pingback: Lightmap optimizations for iOS – Ignacio Castaño