My first product is a real-time ASTC encoder that is orders of magnitude faster than existing offline compressors. It targets a small subset of the available encoding space, but achieves competitive quality through carefully crafted algorithms and creative optimizations.

In addition to that I’m exploring middleware products and applications that advance the state of the art in the areas of RDO texture compression, mesh processing algorithms such as simplification and parameterization, and alternative representations for rendering and physical simulation.

For inquiries, contact me at: castano@ludicon.com

]]>This compelled me to package the BC1 compressor independently as a single header library:

https://github.com/castano/icbc

While doing that I also took the opportunity to revisit the encoder and change the way it was vectorized. I wanted to write about that, but before getting into the details let’s overview how a BC1 compressor works.

A BC1 block is a 4×4 group texels that is represented with 2 16 bit colors and 16 2 bit indices. The colors are encoded in R5G6B5 format and define a 4 or 3 color palette. The entries of the palette are a convex combination of the 2 colors, for this reason they are usually referred as endpoints. The indices select one of the colors of the palette.

Nathan Reed provides a good overview of this and other block compression formats and Fabian Giesen has more lower level details about how the endpoints are interpolated on different GPU architectures.

In order to encode a BC1 block we need to choose the two endpoints that minimize the error. We could think about this problem as solving a system of 16 equations, one for each texel :

In the 4 color block mode the values of and are constrained to one of depending on the index (or selector) assigned to each texel.

If we know the index assigned to each texel, then we can solve the above equation system in the least squares sense and obtain the optimal values for the endpoints. That is, we just need to solve the following equation:

That is, given a set of indices we need to compute two weighted sums of the colors, three weighted sums of the weights, and a 2×2 matrix inverse. An important observation is that if the indices are known in advance, the weighted sums of the weights is fixed and then the matrix inverse can be precomputed.

A simple BC1 encoding strategy is to compute the initial indices using a simple heuristic, to then recompute the endpoints solving the above equation, and finally to update the indices based on the most recent endpoints. This process can be repeated multiple times until the error does not go down anymore. This is the strategy employed by the stb_dxt.h encoder, and just as I was writing this article Fabien Giesen wrote another blog post describing that implementation in more detail.

Another approach would be to solve the optimization problem for all possible index combinations. In a 4×4 block with 2 bit indices per-texel the total number of possible index combinations is and solving this equation for each one of them would be overkill. To make the problem tractable Simon Brown’s clever idea was to only consider the index combinations that preserve the “total order”. That is, we project the colors over a line, we sort them and cluster them in 4 groups assigning an index to each of the groups, the number of ways in which we can do that is much smaller as the index assignment is now constrained by the order of the colors on the line.

To count and enumerate the clusters that are possible with 16 indices and 4 colors we could use code like this:

```
cluster_count = 0
for c0: 0..15 {
for c1: 0..15-c0 {
for c2: 0..15-(c0 + c1) {
cluster_count += 1
c3 = 16 - (c0 + c1 + c2)
println c0, c1, c2, c3
}
}
}
```

The number of resulting indices is just 969. Solving the resulting 969 equations gives us a nearly optimal solution. I say nearly, because the resulting endpoints $a, b$ are not stored exactly, but clamped and quantized. That is, in reality we have a discrete optimization problem (box-constrained integer least squares) and we are approximating it solving a continuous optimization problem and clamping and quantizing the solution.

The line that is used to determine the sort order of the colors is fairly important. The most obvious choice is to use the best fit line and that seems to work best in practice. It’s important to compute it with enough precision. I proposed to use power iterations in order to compute the direction of the first eigenvector of the covariance matrix, but this is very sensitive to the initial approximation. In an early implementation I simply used the vector, which failed miserably when the best fit line was perpendicular to the luminance axis. I have a neat strategy to avoid this that I’ll document another time.

Simon Brown tried to repeat the optimization process by using the direction of the segment connecting the endpoints and repeating this iteratively, but in practice this often produced lower quality results than the solution found in the first iteration, so it’s not recommended.

This same process works with any numbers of clusters. For example, in the 3 color mode we have 3 clusters and the weights have the following values: . We can use the same procedure to enumerate the equations:

```
cluster_count = 0
for c0: 0..15 {
for c1: 0..15-c0 {
cluster_count += 1
c2 = 16 - (c0 + c1)
println c0, c1, c2
}
}
```

And in this case the result is much smaller, just 153. Even though the optimization cost is smaller, the 3 color mode rarely produces higher quality results than the 4 color mode, so the additional cost rarely justifies the effort unless you want to squeeze as much quality as possible out of the format.

That said, the 3 color mode shines when the block has texels close to black. The 4th color in this mode is used to represent transparencies. The alpha is expected to be premultiplied, and for this reason the corresponding RGB component is black. By ignoring all the texels that are near black, we can compute a best fit line that approximates the remaining colors much more accurately and that results in a much lower error. For this strategy to work, though, we need to be aware that the resulting texture may have unexpected alpha values. Some platforms allow us to swizzle the alpha to 255, but otherwise we have to be careful the shaders don’t assume opaque alpha.

For other formats with higher number of palette entries, this optimization strategy is not particularly useful as the number of cluster combinations goes up very quickly:

```
2 -> 17
3 -> 153
4 -> 969
5 -> 4,845
6 -> 20,349
7 -> 74,613
8 -> 245,157
```

One of the optimization strategies in Rich’s BC1 compressor (rgbcx) is to reduce the number of cluster combinations to be considered. If you compute a histogram to visualize the distribution of clusters, you will notice that some of them are much more likely to occur than others. By pruning the list of clusters it’s possible to trade quality for increased speed in the encoder. I do not employ this strategy for reasons that I’ll describe later.

So far I have assumed the hardware interpolates the colors using the ideal weights , but in practice, as Fabian describes, each GPU uses a different approximation. If you are targeting a specific platform you can obtain higher quality results by using the corresponding weights when solving the least squares problem.

In the original cluster fit implementation Simon and I used SSE2 and VMX instructions to speedup the solver. These vector instructions can operate on 4 floats at a time. We used them to operate on RGB colors and increased their utilization by storing weights in the alpha component and operating on them simultaneously. Readability of the code suffered and performance was only 2.3 times faster than the scalar implementation.

In that implementation we iterated over all the possible cluster combinations using three nested loops and incrementally computed the color sums that are necessary to solve the least squares system:

```
vec4 x0 = 0;
for c0: 0..15 {
vec4 x1 = 0;
for c1: 0..15-c0 {
vec4 x2 = 0;
for c2: 0..15-c1-c0 {
alphax_sum = x0 + x1 * (2.0f / 3.0f) + x2 * (1.0f / 3.0f);
...
x2 += color[c0+c1+c2];
}
x1 += color[c0+c1];
}
x0 += color[c0];
}
```

This worked well, but this approach didn’t scale to larger vector sizes that are increasingly common in current CPUs and the incremental nature of the computations created many pipeline dependencies.

In my CUDA implementation I used a different approach: I solved a least squares system on each thread. To do that I precomputed the indices of every cluster combination and loaded each one in a separate thread.

The incremental approach I used before was not possible in this setting, so instead I looped over the 16 colors in order to compute the corresponding sums:

```
parallel for i: 0..968 {
indices = total_indices[i];
alphax_sum = {0, 0, 0};
for j: 0..15 {
index = (indices >> (2*j)) & 0x3;
alpha = weight(index);
alphax_sum += alpha * colors[j];
...
}
...
}
```

I was planning to do something along these lines when I revisited my CPU implementation, but instead I borrowed a trick from Rich Geldreich: to use summation tables to quickly compute partial color sums.

If we compute the sums of the sorted colors as follows:

```
color_sum[0] = { .r = 0, .g = 0, .b = 0 };
for (int i = 1; i <= color_count; i++) {
color_sum[i].r = color_sum[i - 1].r + colors[i].r;
color_sum[i].g = color_sum[i - 1].g + colors[i].g;
color_sum[i].b = color_sum[i - 1].b + colors[i].b;
}
```

Then we can easily compute the partial sums of the colors in any of the clusters by subtracting two entries from that table:

```
parallel for i: 0..968 {
c0, c1, c2 = total_clusters[i];
x0 = color_sum[c0] - 0;
x1 = color_sum[c1+c0] - color_sum[c0];
x2 = color_sum[c2+c1+c0] - color_sum[c1+c0];
alphax_sum = x0 + x1 * (2.0f / 3.0f) + x2 * (1.0f / 3.0f);
...
}
```

In practice we load the prefix sum of the cluster counts and compute the color sums with just a couple of subtractions.

```
parallel for i: 0..968 {
c0, c1, c2 = total_clusters[i];
x0 = color_sum[c0];
x1 = color_sum[c1];
x2 = color_sum[c2];
x2 -= x1;
x1 -= x0;
alphax_sum = x0 + x1 * (2.0f / 3.0f) + x2 * (1.0f / 3.0f);
...
}
```

Note also that we don’t need to compute `x3`

, the sum of the colors in the 4th cluster, because it’s only necessary in order to compute and since we know the weights are symmetric, we can simply compute it as follows:

`betax_sum = color_sum[15] - alphax_sum`

In order to do this efficiently with vector instructions it’s necessary to lookup the values from the summation tables without falling back to scalar loads.

This can be done very easily with the AVX512 instruction set. The summation tables have 17 entries, but we know the first entry is always zero, so the entire table can be loaded in a 512 bit register, the index decreased by 1 and the loaded value zeroed when the index is negative. To perform the lookup we can use `_mm512_maskz_permutexvar_ps`

and take advantage of the mask argument to zero the first index.

```
__m512 vfltmax = _mm512_set1_ps(FLT_MAX);
__m512 vr_sum = _mm512_mask_load_ps(vfltmax, loadmask, r_sum);
__m512 vg_sum = _mm512_mask_load_ps(vfltmax, loadmask, g_sum);
__m512 vb_sum = _mm512_mask_load_ps(vfltmax, loadmask, b_sum);
vx0.x = _mm512_maskz_permutexvar_ps(c0 >= 0, c0, vr_sum);
vx0.y = _mm512_maskz_permutexvar_ps(c0 >= 0, c0, vg_sum);
vx0.z = _mm512_maskz_permutexvar_ps(c0 >= 0, c0, vb_sum);
vx1.x = _mm512_maskz_permutexvar_ps(c1 >= 0, c1, vr_sum);
vx1.y = _mm512_maskz_permutexvar_ps(c1 >= 0, c1, vg_sum);
vx1.z = _mm512_maskz_permutexvar_ps(c1 >= 0, c1, vb_sum);
vx2.x = _mm512_maskz_permutexvar_ps(c2 >= 0, c2, vr_sum);
vx2.y = _mm512_maskz_permutexvar_ps(c2 >= 0, c2, vg_sum);
vx2.z = _mm512_maskz_permutexvar_ps(c2 >= 0, c2, vb_sum);
```

NEON does not have packed scalar permutes, nor 512 bit registers, but supports vqtbl4q_u8, a form of the TBL instruction that performs a vector lookup from 4 source registers. This is not as convenient as the AVX512 instruction, because it operates at the byte level. But that only means we need to do a bit more precomputation and transform the sums of cluster sums into byte offsets. Conveniently, if an index is out of range, the result for that lookup is 0, so we don’t have to worry about masking negative indices.

```
vx0.x = vreinterpretq_f32_u8(vqtbl4q_u8(r_sum, idx0));
vx0.y = vreinterpretq_f32_u8(vqtbl4q_u8(g_sum, idx0));
vx0.z = vreinterpretq_f32_u8(vqtbl4q_u8(b_sum, idx0));
vx1.x = vreinterpretq_f32_u8(vqtbl4q_u8(r_sum, idx1));
vx1.y = vreinterpretq_f32_u8(vqtbl4q_u8(g_sum, idx1));
vx1.z = vreinterpretq_f32_u8(vqtbl4q_u8(b_sum, idx1));
vx2.x = vreinterpretq_f32_u8(vqtbl4q_u8(r_sum, idx2));
vx2.y = vreinterpretq_f32_u8(vqtbl4q_u8(g_sum, idx2));
vx2.z = vreinterpretq_f32_u8(vqtbl4q_u8(b_sum, idx2));
```

At first I struggled to implement these with AVX2 and SSE2 instructions.

My first attempt with AVX2 to perform a masked lookup was to load the table in two registers, use `_mm256_permutevar8x32_ps`

on each one, combine the result with `_mm256_blendv_ps`

and mask the zero elements:

```
vlo = _mm256_permutevar8x32_ps(vlo, idx);
vhi = _mm256_permutevar8x32_ps(vhi, idx);
v = _mm256_blendv_ps(vlo, vhi, idx > 7);
v = _mm256_and_ps(v, mask);
```

This worked well, but Fabian Giesen pointed out that permutes and blends conflict for the same port and instead suggested to first `xor`

the upper part of the table:

`vhi = _mm256_xor_ps(vhi, vlo);`

So that the `blend`

can be emulated with a sequence of `cmp`

, `and`

and `xor`

:

```
vlo = _mm256_permutevar8x32_ps(vlo, idx);
vhi = _mm256_permutevar8x32_ps(vhi, idx);
v = _mm256_xor_ps(vlo, _mm256_and_ps(vhi, idx > 7));
v = _mm256_and_ps(v, mask);
```

This resulted in a significant speedup, but the cool thing is that the same idea extends to larger tables or to architectures with smaller register sizes. For example, in SSSE3 we can use the `pshufb`

instruction with the same tables we used in the VTL NEON code path, but first we `xor`

the upper part of the table as follow:

```
tab3 = _mm_xor_ps(tab3, tab2);
tab2 = _mm_xor_ps(tab2, tab1);
tab1 = _mm_xor_ps(tab1, tab0);
```

and then at runtime we can use `pshufb`

to emulate our lookup combining multiple permutations:

```
v = _mm_shuffle_epi8(tab0, idx);
idx = _mm_sub_epi8(idx, _mm_set1_epi8(16));
v = _mm_xor_si128(v, _mm_shuffle_epi8(tab1, idx));
idx = _mm_sub_epi8(idx, _mm_set1_epi8(16));
v = _mm_xor_si128(v, _mm_shuffle_epi8(tab2, idx));
idx = _mm_sub_epi8(idx, _mm_set1_epi8(16));
v = _mm_xor_si128(v, _mm_shuffle_epi8(tab3, idx));
```

Note that `pshufb`

sets the destination to zero when the second argument is negative. This eliminates the need to perform an `and`

and a comparison as we did in the AVX2 case, and also handles automatically the indexing of the zero element of the summation table.

I’ve glossed over many of the implementation details to focus on the aspects that I think are most interesting, but if you are curious here’s the whole implementation.

An important difference between the ICBC compressor and most other compressors out there is that it supports per-texel weights. What this means is that rather than solving the least squares equation in the ordinary way, I scale the equation corresponding to each texel with the associated weight.

This improves the quality of alpha maps significantly allowing the compressor to approximate opaque texels much more accurately than transparent or semi-transparent ones. It’s also important when compressing lightmaps; texels outside of the chart footprint do not need to be considered by the compressor and can be ignored. This also offers a convenient way of supporting block sizes that are not exactly 4×4 and it’s also useful when compressing textures encoded in RGBM. Since the M multiplies the RGB components, the errors associated to texels with lower M values are less important than those associated to texels with high M values. Another application incorporate perceptual metrics to adjust these weight maps based on the smoothness or other features of the image.

On the other hand this introduces some additional cost. That’s is not just because you have to scale the and terms when computing the sums of the least squares matrix, but because without the texel weights the inverse of the least squares matrix corresponding to each cluster combination would be known in advance and could be precomputed. With fixed weights, to solve the equation system you just need to load the precomputed inverse and perform a vector/matrix multiply.

This is so much faster than evaluating the inverse, that for a while I maintained two versions of the code. One for use when the input was weighted and another when it wasn’t.

However, an advantage of the weighted cluster fit method that I initially overlooked is that color blocks often have some texels with the same value. In those cases, the number of colors in the block can be reduced and their weights adjusted accordingly. This improves performance significantly, because the number of cluster combinations depends on the number of input colors and grows faster than linearly:

Unique Colors | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Cluster Combinations | 3 | 9 | 19 | 34 | 55 | 83 | 119 | 164 | 219 | 285 | 363 | 454 | 559 | 679 | 815 | 968 |

For example, with 16 colors there are 969 combinations, but with 12 colors there are 454 combinations (roughly half), and with 8 colors there are only 164 combinations (17% of the total). The precomputation method is only faster when the number of colors is exactly 16. With 15 colors the weighted method already runs at about the same speed and color blocks with 16 unique colors are actually very rare.

Finally, another interesting advantage of the weighted cluster fit method is that it also provides a simple way to adjust the quality of the output and reduce the time it takes to encode a block. The only thing that we need to do is to cluster or snap together some of the input colors to reduce the total number of colors in the block.

The following graph shows how the quality and compression time changes as we adjust the clustering threshold:

It may be possible to combine this strategy with Rich’s, but his requires precomputing subsets of cluster combinations that is specific to the number of colors in the block. With a variable number of colors the amount of data to tune and precompute would go up significantly.

There are many aspects of my implementation that are at bit sloppy and I’m sure it’s possible to squeeze a bit more performance out of it. Significant parts of the code are not vectorized, and this becomes very noticeable at low quality settings (look at all that empty space to the left of the curve in the graph above!). Algorithmic improvements are still possible, and I’ll be writing about some of them in the future.

The BC1 format is not particularly relevant today, but many of the techniques employed here can be useful in other settings. In modern formats such as BC7 and ASTC the search space is so much larger that most encoders don’t put a lot of effort into optimizing the solution for a specific partition or mode, some of these techniques can still be relevant.

]]>This is a common mistake that Americans make, but I was unhappy about it and hoped it could be fixed easily, so I made an appointment at the DMV to correct it. I brought my previous license as proof of identification. It had not expired yet, so it seemed to me that it would be a valid proof of identification. However, I was informed that in order to correct the error I would also have to bring my birth certificate.

I then made another appointment and went with my birth certificate. I was born in Spain, so it was a Spanish birth certificate with a certified english translation. Turns out however, that being a foreigner, I would have to bring my passport instead.

Well, one more appointment, and as you may be guessing already, my passport was not enough. I am a permanent resident, so the DMV worker actually needed my green card.

For my last appointment I brought all the requested documents and more, and finally things went smoothly. There was no wait! Employees were polite and helpful! They scanned my documents, and I was able to request a new license with my last name corrected. I was given a temporary license that didn’t have the error, and was told my new license would be mailed in a few weeks. I was impressed!

A few days ago my new license arrived and my last name still had the same error.

]]>This Winter I bought Nacho his first pair of mountaineering boots and crampons. He has been practicing self belay and self arrest over the last couple of years, but it was time to take his climbing to the next level. One of our goals is to climb Shasta. This does not only requires competency using crampons and ice axe, but also fitness and endurance to ascend 7000 feet in a couple of days.

With 4000 ft of elevation gain, climbing Pyramid in a day would be a good preparation for that challenge, so last Sunday we woke up at 3:30, loaded up the car and drove to the trailhead to start climbing before sunrise.

The first obstacle was the Pyramid Creek headwall. It’s an easy third class scramble that we have climbed several times during the summer, but climbing it in early Spring, with mountaineering boots, and during the first hours of the day is a different story.

Hands get cold at the touch of the rock. The melting snow makes the rock wet and slippery. Sometimes it even blocks the route entirely; one of the gullies that we usually climb had become a small waterfall so we had to find an alternate route that was more exposed.

To my delight, Nacho seemed extremely comfortable climbing in boots. He was able to move fast, intuitively, and with confidence. I’m super happy with the Trango Tower boots I purchased (he’s using the women’s version, because I couldn’t get the men’s in his size).

The upper part of the headwall was completely covered in snow. It was still cold and the snow was frozen, so we switched to crampons, but by the time we topped off the headwall the sun was already up and the snow was starting to get soft, so we had to take them off and put snow shoes on.

When we arrived to the south east ridge it became clear it was gonna pose an interesting challenge. The entire ridge was lined with a giant cornice. It appeared to have broken off at several points and the entire slope showed signs of wet avalanche, in particular over the East Face couloir; the route we were planning to take.

The gentler slopes north of it didn’t look much better either, so we decided to skirt the ridge to the south until the end of the cornice, probe the snow there, and go back down if the conditions didn’t seem safe. I may have over estimated the risk of avalanche, but the tragic dead of David Lama in an avalanche a few days ago still resonated in my mind so I decided to act conservatively.

We approached the ridge following what seemed a tree lined ledge and at the end of it we went straight up. The snow was solid and stable enough, but steeper than I had expected; at some point it was nearly 70 degrees. Nacho had never climbed snow so steep, so naturally I was a bit nervous, but he was able to follow me without any problem. I even let him break trail for a while.

When he was near the top he stopped and said he could not get a good purchase with his ice axe. I caught up to him to see what was going on and suddenly slipped down a hidden moat that was covered by snow. I fell all the way down to my chest and couldn’t touch the bottom with my feet! I held onto my ice axe, kicked to the sides of the moat and pulled myself up. I wonder how he would have reacted in my situation! but luckily the snow bridge was strong enough to hold his weight.

The cornice over the moat was hard and almost vertical, but it was only a few feet tall. I kicked some good steps and built a ladder with our ice axes and we went over it without much trouble.

Getting over that last obstacle was exhilarating. We were both ecstatic we had made it this far. However, as we continued over the ridge the excitement faded and his energy dwindled. We had to take numerous breaks and it became clear that Nacho was having signs of altitude sickness. At 9000 feet we took a long break, lied on our backpacks, and replenished our energy, but the symptoms didn’t fade and it was getting late (we had been climbing for more than 8 hours), so we decided to call it a day head back down.

**Lessons learned:** It’s clear that Nacho needs more time for altitude adaptation. If I were to do this again I would travel the night before and sleep at the trailhead. This would also have allowed us to start climbing earlier. Moving in the cold while the snow is hard is faster and safer than over soft snow, and we could have avoided the use of snow shoes. I still have to help Nacho put on his crampons and tighten his snowshoes. These transitions take a long time. He needs to get more practice so that we can do this quicker and in parallel.

Ladybugs are migratory insects. Here in California, during the winter months, they travel from the valley to the foothills and clump together at specific spots, typically sunny areas near water, covering rocks and vegetation in a living red carpet.

The reason why the ladybugs clump together is not very well understood. Some studies suggest that clumping allows them to retain heat better and that the aggregated odor and color of the clump is a more effective predator deterrent, others argue that it’s also part of their mating ritual. One way or the other it’s a sight to behold.

A couple of weekends ago we went to Feather Falls trail in Plumas National Forest hoping to see this curious behavior. While we had seen clumps of ladybugs before, I had heard that their concentration near Frey Creek was remarkably high, so we had some high expectations.

My previous visit to this area was at the beginning of April and the ladybugs were nowhere to be seen. I assumed it was too late into the spring and that the lady-beetles had left the area already, but surely late February, with fresh snow on the ground, should be wintery enough!

I was a bit disappointed. We did find some ladybugs, but the numbers were not particularly impressive and nowhere near the reports I had heard of. What was going on?

Another popular clumping behavior is that of the Monarch Butterfly. A good spot to see them here in Northern California is in Santa Cruz, at Natural Bridges preserve. However, I’ve heard that lately their populations are dwindling and that this winter the number of butterflies was disturbingly low. Could the same be happening to the ladybugs?

After checking with some experts, it turns out that’s not the case. The most likely explanation is human harvesting! Ladybugs are used for pest control in organic gardening and the great majority of these ladybugs are harvested from the wild, from places like this.

I’m horrified! Not just because many of the ladybugs die during this process, but also, because removing them from the wild and not letting them disperse naturally, makes non-organic farmers even more dependent on pesticides.

This weekend the ladybugs were not in my mind anymore. Instead I was hoping to show the kids another curious animal phenomemon: the mating of the California newts. In early Spring the newts come back to the creeks in which they were born in order to mate and subsequently lay eggs. When the females are in heat, they attract large number of males, who clump around her in the hopes of coppulating in what appears to be an amphibian orgy.

A few years ago in early March I encountered several groups of newts mating while I hiked along the South Yuba River, and this weekend we were hoping to find them again.

Unfortunately, with the recent rains the water in the creeks was much higher than the last time I was here and it was flowing pretty fast for the newts to be returning to it already. We checked several creeks along a stretch of 4 miles with the same outcome. I was starting to think the trip was going to be another disappointment. However, when reaching our final destination: a particularly dramatic meander in the river where the valley opens up and gets a bit more sun light, we encountered an unexpected surprise: Thousands of ladybugs carpeting the ground!

Do you know of any other good spot to find overwintering ladybugs? Or of any other insect with a similar behavior? Please, share in the comments!

]]>I have to admit that back then I had very little experience camping in the winter, but had done it enough times to feel confident taking him along. That said, I didn’t know what to expect, we didn’t have any specialized winter equipment, just our 3 season tent and regular camping gear, we even had to rent our snowshoes! Thankfully winters in California are fairly mild and weather forecasts are pretty accurate, so it was easy to pick a day with good weather and warm temperatures.

Despite that I wasn’t sure Nacho would be able to do this. He was only 8 at the time and he had never been at that altitude. He was a good hiker, but as we ascended the weight of the snowshoes and the thinning air was taking a toll on him. He had trouble breathing and was about to give up several times. We took many breaks, I encouraged him, and little by little the summit kept getting closer until we finally made it.

We were both ecstatic. I couldn’t be more proud and he felt like he had conquered the impossible.

Fast forward to 2019. Nacho and I have been on many more winter adventures, but Maia has never enjoyed the snow very much, in fact, she would complain at the mere mention of it. However, she’s always known that when she turns 8 she would come with me to climb Castle Peak like her brother did.

And today, despite of her fears she did remarkably well. She carried a heavy pack, we burrowed in our tent for the night, and she climbed all the way up without practically any complaint. I have to admit we had much better equipment this time, but more importantly, I felt much more confident and had high expectations. Having an older brother to keep up with is turning her into a really strong climber. She probably felt that if he could do it, why wouldn’t she?

]]>I thought 2018 was a slow year compared to the previous ones, but now that I sit down and look at everything we have done, I think it’s probably about average. I went on a total of 36 trips, totaling 63 days and 16 nights outdoors. The kids joined me on 24 of those trips (for 41 days and 10 nights). Initially I was thinking I could write a summary about each one, but that would be a very long post! Instead I’m just going to highlight the ones that I enjoyed the most.

None of the trips were particularly challenging. The year was punctuated by several injuries and I’ve been feeling out of shape and more tired than usual. I don’t know if this is a sign of me getting old, or just that I need to slow down and give myself more time to recover. The kids are also growing up, it’s getting easier to take them along, and they both enjoy hiking and climbing, so I’ve been exploring with them more instead of going on personal trips.

Nacho and I try to do two major ascents every year, one in Winter and another in Summer. Last year we tried to climb Round Top, but we didn’t make it. The wind on the ridge was so strong that we could barely stand up and we had to turn around. This time however, the weather was perfect and we made it all the way to the summit. While last year we camped by Winnemucca Lake and tried to summit in the morning, this time we drove up before dawn, arrived at sunrise, and climbed it in a day. Car to car, it took us around 8 hours, a pretty long day for an 11-year-old!

One of the most exciting events of the year was our Spring break trip. I had committed to climb Mt Hood with MAA and had to drive up there anyway, so I made a family vacation out of it and went with Cristina and the kids. We spent a few days driving north, stopping by the Redwood National Parks, Crater Lake, Smith Rock; camping and hiking along the way, until we arrived to Portland. We rested there a couple of days and then Cristina flew back with the kids, while I stayed behind and tried my luck on an early season ascent of Mt Hood.

The plan was to camp at Illumination Saddle the first day and then attempt to reach the summit on the next. We had a promising start, but as we ascended the weather worsened, and a whiteout set over the mountain. We did not only have strong winds and zero visibility, but the mist was getting everyone wet. Most of us were not ready for such humid conditions and we had to setup an emergency camp next to the Palmer lift.

I had a terrible night. My wet clothes were soaking my down bag. I could not get my body to warm up, let alone catch any sleep. Morale was low and everyone expected to get back down the next morning. However, when I got out of the tent at 3 AM the sky was clear and the wind was gone! Excited I woke everyone up, headed up with 6 other climbers, and made it to the summit.

This was the kids’ favorite trip of 2018 and is certainly one of our greatest discoveries. The Enchanted Pools are a series of swimming holes, shallow pools, cascades, and water slides along the drainage of Twin Lakes in Desolation Wilderness. To get there you have to hike out of trail and occasionally wade, swim, and scramble. You also have to time it right: Too early and the water will be freezing and the currents too strong. Too late and the creek will be dry and you will only find pools of stagnant water. On July 1st of 2018 conditions were perfect. I’d like to write more about this some day, but for now I’ll just tease you with a couple of pictures.

I’m lucky to have a wife that tolerates (and sometimes enjoys) my adventures. We got married in 2017, but this summer we invited our family and friends to celebrate with us doing what we both enjoy the most: playing outdoors and soaking in water.

It was amazing to have so many of our family and friends join us and to share with them some of the places we love. I don’t really like how weddings are celebrated in the US and we wanted to do something more intimate and personal, and a little more adventurous. I’m really happy with how things turned out!

We rented a large vacation house near Nevada City, camped around the property, went on excursions to Lake Tahoe, the Yuba and its surroundings, and had a lovely ceremony by the river. We didn’t hire any wedding or catering service, but instead did everything on our own, it was a team effort. Our only regret is not hiring a photographer to capture it in more detail.

I was lucky to get backpacking permits in Tuolumne Meadows for the last weekend before the school starts, I had to reserve these 6 months in advance! The year before Nacho and I had a blast climbing Unicorn Peak and we had set our eyes on Echo Peak, a taller peak on the same ridge composed of multiple class 4 summits. Our goal was to climb as many of them as we could. To save weight I didn’t carry a rope, but seeing him solo that terrain was a bit unnerving. He didn’t think it was big deal (and it probably wasn’t), but I was freaking out a little.

We have been going to Joshua Tree every Fall for the last few years, but this trip was special for two reasons. First, Maia is becoming a confident climber and I was excited to have her practice her new skills outdoors. Second, I rented a pretty cool camper van and we got to experience the “van” life for a few days.

Days are short in the Fall and I try to spend them outdoors, which means that I do most of the driving during the night. The hardest part of these trips is getting to our destination, cooking dinner and setting up camp in the cold dark with the kids half asleep. In this respect the van made things a lot easier. The bed was super comfortable; we still had to sleep in our sleeping bags, but the luxurious memory foam mattress made a big difference. On the other side, I feel like it disconnects you from the place and detracts from some of the adventure. While in the past the kids would get out and start scrambling and exploring as soon as the sun came out, in the van it was much harder to get things moving in the morning, and I have no stories to share about waking up in the middle of a sand storm to anchor the tent to avoid flying away. All in all, I think it was a positive experience, and I’m already dreaming about the day I get to afford to have my own van.

The trip itself was great. On the way there we spent a day at Pinnacles National Park, hiked the High Peaks trail, explored the Bear Gulch cave system, and were lucky to see some condors from really close. In Joshua Tree we revisited some of our favorite locations and explored some new climbing areas, and on the way back we stopped by some of the Redwood Groves in Sequoia National park and hiked to the fire lookout on top of The Needles.

I’ve decided I’m going to stop leading trips for MAA. I’m enormously grateful for the opportunities the club has given me, the friendships I’ve made, and everything I’ve learned from them. However, organizing and leading trips is stressful, distracting, and time consuming. While I’m not going to stop climbing, I’m going to focus on doing that with my friends and my family.

I think Maia is ready for her first winter ascent, I’m planning to take her to Castle Peak in a few weeks. Nacho should be able to climb Pyramid Peak this Winter, and maybe Tenaya this Summer. For Spring break I’d like to do a family trip to the Grand Canyon. As for me I also need goals to keep training and stay motivated. I’d like to get back to the Tetons, climb Mt Moran, Teewinot, and revisit the Grand through a different route. It would also be great to do some backpacking over there with the kids and take them to Yellowstone. Closer to home in California I have such a large ‘to do’ list that it would require another post!

]]>One of the main challenges of porting The Witness to iOS was reducing the app memory footprint. The lightmaps that we used in the PC version simply did not fit in the memory budget that we had for iOS.

As described in my previous article, on PC we compress our lightmaps using DXT5-RGBM. The DXT5 texture compression format is not available in iOS, so the first problem was to find a suitable alternative.

PVR is the most popular texture compression format on iOS, but it was not a particularly good choice. It has some annoying size constrains and doesn’t support an independent alpha channel; using a RGBM scheme would require sampling multiple textures.

Since we require metal for rendering, all the devices we care about also support ETC2, which has a mode analogous to DXT5 with one block representing the RGB colors and another block representing the alpha. This was a much better fit for our RGBM lightmap representation.

For the RGB components I simply used Rich Geldreich’s ETC compressor (rg-etc1). This only supports ETC1 block mode, without the ETC2 extensions. Lightmaps are often fairly smooth, and I was hoping ETC2’s planar mode would be very effective at representing them. I wrote a nearly optimal planar compressor using least squares fitting, but interestingly the standard ETC1 mode was able to achieve lower errors in most cases, only 4% of the blocks ended up using the planar mode.

As you can see in this graph, the quality of ETC2-RGBM is slightly lower than DXT5-RGBM, but in practice it’s almost impossible to appreciate any difference despite the higher error. Adding support for T and H modes may help reduce the gap between DXT5-RGBM and ETC2-RGBM, but it did not seem worth the effort and I had more important issues to address.

With these changes the size of the lightmaps on mobile was now the same as on PC, but our goal was to reduce it further. One of the simplest solutions is to simply reduce the resolution of the lightmaps. This works to some extent, but due to inefficiencies in lightmap packing, the size of the lightmaps is not directly proportional to the resolution. The lower the resolution, the higher these inefficiencies, so this approach alone provides diminishing returns. For example, in the image below the lightmap area on the right side is one quarter of the left, but the number of texels is only halved.

To overcome that we used a slightly more aggressive packing scheme, but the most effective approach was to use per-vertex lightmaps computed using least squares vertex baking.

Least squares vertex baking was specially effective on small meshes that are instanced numerous times, such as pebbles and small rocks. However, it also introduced some new problems.

In one of my first articles about the lightmap tech of The Witness I described the method used to identify invalid samples: when rendering hemicubes I counted the number of back facing texels. When that number was above a certain threshold the sample was considered invalid and its color would be extrapolated by the surrounding samples instead. This worked well, but the threshold we used was somewhat conservative, because many of our meshes were not perfectly watertight or not correctly closed. This resulted in some false positives, that is, some invalid samples were considered valid. This was generally not a problem, because these samples were usually not visible from the player point of view, but hidden behind other geometry (for example, under the ground). However, when using least squares vertex baking, the contribution of these samples would influence the vertices of the triangle it belonged to, and often these vertices were visible by the player.

The only solution that I could come up with was to tighten the validation threshold, but this resulted in many new artifacts resulting from sloppy geometry, that then had to be cleaned up, consuming valuable artist time.

Despite the extra work, I think the results justified the effort. At the most expensive location, the lightmap memory use was reduced to a 25% of the original:

PC: 234 MB iOS: 60 MB

And in the redistributable package, the lightmap assets were reduced to a mere 17%:

PC: 1781 MB iOS: 304 MB]]>

In my initial implementation of our lightmapping technology I simply stored lightmap textures in RGBA16F format. This produced excellent results, but at a very high memory cost. I later switched to the R10G10B10A2 fixed point format to reduce the memory footprint of our lightmaps, but that introduced some quantization artifacts. At first glance it seemed that we would need more than 10 bits per component in order to have smooth gradients!

At the time the RGBM color transform seemed to be a popular way to encode lightmaps. I gave that a try and the results weren’t perfect, but it was a clear improvement and I could already think of several ways of improving the encoder. Over time I tested some of these ideas and managed to improve the quality significantly and also reduce the size of the lightmap data. In this post I’ll describe some of these ideas and support them with examples showing my results.

I believe the RGBM transform was first proposed by Capcom in these CEDEC 2006 slides. While Capcom employs it for diffuse textures, it has become a popular way to encode lightmaps. RGBM or some of its variations are used in Unity, Bioshock Infinite, and the Unreal Engine, among others. Its use for standard color textures is not as widespread, but Shane Calimlim found it to be a good fit for the stylized artwork of Duck Tales and suggests it could be a good format in general. However, with so many precedents, I was surprised it had not been analyzed in more detail.

The main challenge of compressing lightmaps is that often they have a wider range than regular diffuse textures. This range is not as large as in typical HDR textures, but it’s large enough that using regular LDR formats results in obvious quantization artifacts. Lightmaps don’t usually have high frequency details, they are often close to greyscale, and only have smooth variations in the chrominance.

In our case, most our lightmap values are within the [0, 16] range, and in the rare occasions when they are outside of that range, we constrain them clamping the colors while preserving the hue to avoid saturation artifacts. Brian Karis also suggests tone mapping the upper section of the range to avoid sharp discontinuities, but I only found this to be a problem when light sources had unreasonably high intensity values.

The shape of the lightmap color distribution varies considerably. Interior lightmaps are predominantly dark with a long tail of brighter highlights:

while outdoor lightmaps have a more Gaussian distribution with a bell-like shape. This particular lightmap is under the shade of some colored fall trees, which give it an orange tone:

Not all lightmaps use all the available range, so after tone mapping the next thing we do is to scale the range to [0, 1].

So, why is RGBM a good choice for data like this? The distribution of distinct values that can be represented with RGBM looks as follows:

It provides much more precision toward 0 than toward 1. This is beneficial for images that are intended to be visualized at multiple exposures. We want to obtain smooth lightmaps without quantization artifacts independently of the camera exposure. However, as we will see later, this provides much more precision around 0 than is actually necessary.

In my initial implementation I simply used RGBA8 textures, squaring the colors to perform gamma correction in the shader. The standard `rgb -> RGBM`

transform is as follows:

m = max(r,g,b) R = r/m G = g/m B = b/m M = m

A simple improvement I did early on is to divide the quantization interval in two. This is a variation of the idea presented in Microsoft’s LUVW HDR texture paper, but instead of using an extra texture, I simply rely on the RGB and alpha (M) channels.

A similar observation is done by Shane Calimlim:

Gray is encoded as pure white in the color map, which may not always be optimal. Gray is an edge case most of the time, but a smarter encoding algorithm could make vast improvements in its handling. In the simple version of the algorithm the entire burden of representing gray lies with the multiply map; this could be split between both maps, improving precision greatly in scenarios where the color map can accommodate extra data without loss.

But in our case grey is not really an edge case! Lightmaps are mostly grey with slight smooth color variations.

The way I implemented this is by choosing a certain threshold `t`

. For values of `m`

that are lower than `t`

the color is fully encoded using only the RGB components as follows:

R = r/t G = g/t B = b/t M = 0

and for values of `m`

greater than the threshold `t`

, the normalized color is encoded in the RGB components, and the normalization factor `m`

is biased and scaled to store it at a higher precision:

R = r/m G = g/m B = b/m M = (m-t) / (1-t)

That’s equivalent to just doing:

m = max(r,g,b,t) R = r/m G = g/m B = b/m M = (m-t) / (1-t)

This is useful for several reasons. As Shane notes, splitting the burden of representing the luminance between the RGB and M maps we can obtain more precision and reduce the size of the quantization interval.

It’s important to note that this actually reduces precision around zero, where we don’t actually need so much, because the game camera never has long enough exposures. If we look at the distribution of grey levels that biased RGBM can represent it now looks as follows:

Picking different values of `t`

allows us to use different quantization intervals for different parts of the color range. The optimal choice of `t`

depends on the distribution of colors in the lightmap and the number of bits used to represent each of the components. We chose this value experimentally. For our lightmaps values around 0.3 seemed to work best when encoding them in RGBA8 format.

With these improvements RGBM was already producing very good results. Visually I could not see any difference between the RGBM lightmaps and the raw half floating point lightmaps. However, I had not reduced the size of the lightmaps by much and ideally I wanted to compress them further.

The next thing that I tried to do was to choose `M`

in a way that minimizes the quantization error. I did that by brute force, trying all possible values of `M`

, computing the corresponding `RGB`

values for that choice of `M`

, and selecting the one that minimized the MSE:

for (int m = 0; m < 256; m++) { // Decode M float M = float(m) / 255.0f * (1 - threshold) + threshold; // Encode RGB. int R = ftoi_round(255.0f * saturate(r/ M)); int G = ftoi_round(255.0f * saturate(g / M)); int B = ftoi_round(255.0f * saturate(b / M)); // Decode RGB. float dr = (float(R) / 255.0f) * M; float dg = (float(G) / 255.0f) * M; float db = (float(B) / 255.0f) * M; // Measure error. float error = square(r-dr) + square(g-dg) + square(b-db); if (error < bestError) { bestError = error; bestM = M; } }

This improved the error substantially, but it introduced interpolation artifacts. The RGBM encoding is not linear, so interpolation of RGBM colors is not correct. With the naive method this was not a big deal, because adjacent texels usually had similar values of `M`

, but the `M`

values resulting from this optimization procedure were not necessarily similar anymore.

However, it was easy to solve this problem by constraining the search to a small range around the `M`

value selected with the naive method:

float M = max(max(R, G), max(B, threshold)); int iM = ftoi_ceil((M - threshold) / (1 - threshold) * 255.0f); for (int m = max(iM-16, 0); m < min(iM+16, 256); m++) { ... }

This constrain did not reduce the quality noticeably, but eliminated the interpolation artifacts entirely.

While this idea showed that there's a significant optimization potential over the naive approach, it did not get us any closer to our stated goal: to reduce the size of the lightmaps. I tried to use a packed pixel format such as RGBA4, but even with the optimized encoding, it did not produce sufficiently high quality results. To reduce the size further we would have to use DXT block compression.

Simply compressing the RGBM data produced poor results and compressing the optimized RGBM data did not help, but instead only degraded the results even more.

A brute force compressor is not practical in this case, because when processing blocks of 4x4 colors simultaneously the search space is much larger.

A better approach is to first compress the `RGB`

values obtained through the naive procedure using a standard DXT1 compressor and then choosing the `M`

values to compensate for the quantization and compression errors of the DXT1 component.

That is, we want to compute `M`

so that:

M * (R, G, B) == (r, g, b)

This gives us three equations that we can minimize in the least squares sense. The `M`

that minimizes the error is:

M = dot(rgb, RGB) / dot(RGB, RGB)

In my tests, the resulting `M`

's compress very well in the alpha map and reduced the error significantly.

I also tried to encode `RGB`

again with the newly obtained `M`

, and compress them afterward, but in most cases that did not improve the error. Something that worked well was to simply weight the `RGB`

error by `M`

in the initial compression step.

The number of bits allocated for the `RGB`

and `M`

components is very different than in our initial RGBA8 texture, so the choice of `t`

had to be reviewed. In this case values of `t`

around 0.15 produced best results. I attribute this to the reduced number of bits per pixel used to encode the `RGB`

channels.

In addition to the described formats I also compared the proposed method against BC6. BC6 is specifically designed to encode HDR textures, but it's not available in all hardware. Our optimized RGBM-DXT5 scheme provides nearly the same quality as BC6:

The above chart is displaying RMSE values of the final images after color space conversion and range rescaling.

To study the effectiveness of the encoders it's more useful to look at the errors before rescaling. These look a lot more uniform, but cannot be compared against BC6 anymore, since in that case adjusting the range of the input values does not usually reduce the compression error.

Finally, I thought it would be interesting to use RGBM-DXT5 to compress standard images and compare it against YCoCg-DXT5. The following chart shows the results for the first 8 images of the kodim image set:

YCoCg-DXT5 is clearly a much better choice for LDR color textures.

Our proposed RGBM encoder was good enough for our lightmaps, but I'm convinced there's more room for improvement.

One idea would be to pick a different threshold `t`

for each texture. Finding the best `t`

for a given texture to be encoded using the plain RGBM linear format would be easy, but it's not so obvious when using block compression.

The RGB components are encoded with a standard weighted DXT1 compressor. It would be interesting to use a specialized compressor that favored `RGB`

values with errors that the `M`

component could correct. For example, the `M`

values resulting from the least squares minimization are sometimes above 1, but need to be clamped to the `[0, 1]`

range, it should be possible to constrain the `RGB`

endpoints to prevent that. It may also be possible to choose `RGB`

endpoints such that the error of the least squares fitted `M`

are as small as possible.

Finally, DXT5 is not available on most mobile GPUs. I haven't tried this yet, but it seems the ETC2 EAC_RGBA8 format is widely available and would be a good fit for the techniques presented here. It would also be interesting to compare our method against packed floating point formats such as (R11G11B10_FLOAT

R9G9B9E5_SHAREDEXP) and ASTC's HDR mode.

In all cases I measured the error using the RMSE metric, which is the same metric used to guide the block compressors. It may make more sense to use a metric that takes into account how the lightmaps are visualized in the game. I did exactly that, tone map the lightmaps at different exposures and compute the error in post-tone-mapping space. The tables below show the resulting values and they roughly correlate with the plain RMSE metric.

Tone mapped error e=2.2 e=1.0 e=0.22 average rmse RGBM8 naive: hallway: 0.00026 0.00045 0.00089 -> 0.00053 | 0.00007 hut: 0.00100 0.00102 0.00082 -> 0.00095 | 0.00609 archway: 0.00114 0.00141 0.00190 -> 0.00148 | 0.00818 windmill: 0.00102 0.00133 0.00185 -> 0.00140 | 0.00083 shaft: 0.00201 0.00228 0.00214 -> 0.00214 | 0.00798 hub: 0.00151 0.00182 0.00191 -> 0.00175 | 0.00267 tower: 0.00153 0.00200 0.00299 -> 0.00217 | 0.00160 tunnel: 0.00094 0.00123 0.00171 -> 0.00129 | 0.00093 mine: 0.00105 0.00120 0.00141 -> 0.00122 | 0.00640 theater: 0.00099 0.00126 0.00160 -> 0.00128 | 0.00129 RGBM8 optimized: hallway 0.00010 0.00015 0.00030 -> 0.00018 | 0.00004 hut 0.00049 0.00043 0.00031 -> 0.00041 | 0.00543 archway 0.00044 0.00060 0.00122 -> 0.00075 | 0.00595 windmill 0.00020 0.00026 0.00036 -> 0.00027 | 0.00024 shaft 0.00059 0.00066 0.00102 -> 0.00076 | 0.00501 hub 0.00038 0.00051 0.00085 -> 0.00058 | 0.00099 tower 0.00060 0.00072 0.00082 -> 0.00072 | 0.00112 tunnel 0.00025 0.00031 0.00042 -> 0.00033 | 0.00048 mine: 0.00044 0.00049 0.00083 -> 0.00058 | 0.00467 theater: 0.00061 0.00076 0.00087 -> 0.00075 | 0.00095 RGBM4 optimized: hallway: 0.00169 0.00259 0.00562 -> 0.00330 | 0.00063 hut: 0.00932 0.00899 0.00773 -> 0.00868 | 0.08317 archway: 0.00906 0.01287 0.02616 -> 0.01603 | 0.09614 windmill: 0.00424 0.00562 0.00830 -> 0.00606 | 0.00402 shaft: 0.01103 0.01314 0.01978 -> 0.01465 | 0.08204 hub: 0.00868 0.01160 0.01848 -> 0.01292 | 0.01722 tower: 0.01004 0.01217 0.01466 -> 0.01229 | 0.01835 tunnel: 0.00516 0.00687 0.01066 -> 0.00757 | 0.00764 mine: 0.00871 0.01044 0.01742 -> 0.01219 | 0.07510 theater: 0.00683 0.00840 0.00963 -> 0.00829 | 0.01057 DXT5 naive: hallway: 0.00155 0.00249 0.00570 -> 0.00325 | 0.00048 hut: 0.00487 0.00536 0.00564 -> 0.00529 | 0.02119 archway: 0.00500 0.00656 0.01039 -> 0.00731 | 0.01949 windmill: 0.00214 0.00287 0.00444 -> 0.00315 | 0.00177 shaft: 0.01062 0.01339 0.01977 -> 0.01459 | 0.03412 hub: 0.00616 0.00796 0.01130 -> 0.00848 | 0.01481 tower: 0.00551 0.00712 0.01019 -> 0.00761 | 0.00735 tunnel: 0.00235 0.00308 0.00451 -> 0.00331 | 0.00285 mine: 0.00471 0.00589 0.00877 -> 0.00646 | 0.01809 theater: 0.00332 0.00412 0.00496 -> 0.00413 | 0.00498 DXT5 optimized: hallway: 0.00125 0.00199 0.00456 -> 0.00260 | 0.00041 hut: 0.00336 0.00373 0.00408 -> 0.00372 | 0.01529 archway: 0.00353 0.00460 0.00719 -> 0.00511 | 0.01285 windmill: 0.00134 0.00180 0.00280 -> 0.00198 | 0.00116 shaft: 0.00801 0.01016 0.01507 -> 0.01108 | 0.02437 hub: 0.00469 0.00602 0.00846 -> 0.00639 | 0.01241 tower: 0.00421 0.00544 0.00781 -> 0.00582 | 0.00599 tunnel: 0.00157 0.00206 0.00306 -> 0.00223 | 0.00193 mine: 0.00338 0.00428 0.00646 -> 0.00471 | 0.01178 theater: 0.00245 0.00302 0.00357 -> 0.00301 | 0.00382 DXT5 optimized with M-weighted RGB: hallway: 0.00114 0.00184 0.00430 -> 0.00243 | 0.00038 hut: 0.00338 0.00382 0.00443 -> 0.00388 | 0.01478 archway: 0.00356 0.00464 0.00725 -> 0.00515 | 0.01271 windmill: 0.00134 0.00180 0.00281 -> 0.00198 | 0.00113 shaft: 0.00804 0.01023 0.01522 -> 0.01116 | 0.02382 hub: 0.00472 0.00611 0.00868 -> 0.00650 | 0.01088 tower: 0.00421 0.00544 0.00787 -> 0.00584 | 0.00597 tunnel: 0.00157 0.00206 0.00306 -> 0.00223 | 0.00193 mine: 0.00337 0.00428 0.00648 -> 0.00471 | 0.01170 theater: 0.00245 0.00302 0.00356 -> 0.00301 | 0.00382]]>

There’s a lot of good stuff in there, more than in any of the previous D3DX source code releases. I was never too happy with the k-means style clustering that we use in The Witness, top-down spectral clustering seems a much better approach. Also their stretch-minimization parameterization is certainly better than our plain LSCM.

I wrote about our implementation in this article. A few people asked for our code and we released it here. However, today I’d recommend using Xin Huang’s as a better starting point.

]]>