three.js + spark.js

three.js r180 was released last week and among the many improvements, the one that I’m most excited about is a series of changes by Don McCurdy that enable the use textures encoded with spark.js in three.js:

  • Support ExternalTexture with GPUTexture #31653
  • ExternalTexture: Support copy(), clone(). #31731

Support for ExternalTexture objects in the WebGPU backend allows you to wrap a GPUTexture object, such as the one produced by Spark.encodeTexture and use it directly in three.js. This makes it straightforward to work with Spark-encoded textures.

Here’s a brief example:

// Load and encode texture using spark:
const gpuTexture = await spark.encodeTexture(textureUrl, { srgb: true, flipY: true });

// Wrap the GPUTexture for three.js
const externalTex = new THREE.ExternalTexture(gpuTexture);

// Use as any other texture:
const material = new THREE.MeshBasicMaterial({ map: externalTex });

With this feature in place, I began testing spark.js in more complex scenarios. The first thing I wanted was a GLTF viewer example. With some help from Don McCurdy I was able to get something running quickly, and after after a bit of polish, I’m very happy with the results. It requires very minimal changes to existing code.

To simplify integration, the spark.js node package includes an addon that you import explicitly:

import { Spark } from "@ludicon/spark.js";
import { registerSparkLoader } from "@ludicon/spark.js/three-gltf";

Then use the imported function to register Spark with an existing GLTFLoader instance:

const loader = new GLTFLoader()
registerSparkLoader(loader, spark)

And that’s it! After registration, the loader will automatically encode textures with Spark whenever applicable.

This exercise was very useful in identifying some issues in the initial spark.js implementation, as well as some limitations in three.js.

Concurrent Image Decoding

The initial performance results of the Spark texture loader were not surprising. I expected it to be a bit slower than the default loader, since Spark needs to dispatch its codecs to the GPU, and the first measurements seemed to confirm this.

After profiling, however, I discovered the reason why the Spark texture loader was slower: the input textures were being decoded sequentially in the main thread instead of being offloaded to a separate thread.

Fixing this was surprisingly simple. To get the image data I was using a DOM Image object and the solution was to simply set the decoding attribute to "async":

img.decoding = "async"

While digging into this, I also learned that the recommended approach is to use the createImageBitmap API. Both approaches worked well, but the latter appeared to be slightly faster in practice, while producing the same results. The only difference I found is that the Image object supported rendering of SVG files while the createImageBitmap function did not, so to maintain feature parity I use Image objects for SVG files, and createImageBitmap for other image types.

With these changes I was pleased to see that the performance of the Spark texture loader was as good as the regular texture loader and in some cases even faster! Texture encoding happens asynchronously in the GPU which is usually idle during load time, so the extra work doesn’t add a noticeable overhead to the main thread.

Both three.js and spark.js need to generate mipmaps for these textures. Spark does this using compute shaders immediately after decoding, while three.js uses fragment shaders and does so at a later stage when the scene is evaluated. These differences may explain why Spark sometimes has a performance advantage, though I’m not familiar enough with the three.js internals to say for certain.

Edit: Arseny points out that image decoding appears to be sequential in Firefox and I’ve confirmed that’s indeed the case with both approaches described above. The only information I could find on the matter is a 14 year old bug report titled “Multithreaded image decoding” which appears to indicate the feature was implemented long ago, so I’m not sure what’s going on. If anyone can shed some light I would appreciate it!

RG Normal Map Support

Another surprise was that three.js did not support normal maps stored in two-channel textures. This is a very common practice, most engines use two-channel compressed textures for normals, but in three.js the material shaders expect RGB values representing the packed XYZ coordinate.

In practice, it’s more efficient to store only the XY coordinates and reconstruct the Z by projecting it onto the hemisphere:

normal.z = sqrt(saturate(1.0 - dot(normal.xy, normal.xy)));

The XY normal map coordinates are usually not correlated, so block compression using a single plane doesn’t work very well. Early implementations used DXT5 textures using the Alpha block and the Green channel of the color block to encode the X and Y coordinates independently. Jan Paul van Waveren and I analyzed this scheme in our Real-Time Normal Map DXT Compression paper back in 2008.

Today, we have specialized two-channel compression formats that are ideal to encode normal maps: BC5 on PC and EAC11_RG on mobile. Even with more general formats like BC7 and ASTC it’s also beneficial to use two-channel encodings. For example, the Arm ASTC Encoder guidelines recommend packing normals as XXXY in order to use the Luminance-Alpha endpoint mode.

To support these packing schemes I’ve submitted a PR to three.js, which is currently being reviewed, and I hope will land in the upcoming r181 release.

This feature benefits all three.js developers working with compressed textures, but to improve the quality of existing assets with offline-compressed textures it’s necessary to recompress them. In contrast, spark.js will leverage this feature as soon as it becomes available, automatically improving the quality of existing normal maps without any additional work.

Performance Results

Measuring performance on web applications is tricky. Execution is very asynchronous, so when timing a piece of code it’s hard to be sure of what you are timing exactly. You could be timing extraneous tasks that happen to be scheduled in between your timing measurements, and code that you think you are measuring may actually be evaluated at a later point.

The chrome profiler has proven very useful to understand unexpected timing results. I wish the GPU timeline displayed the GPU debug group annotations, but in my tests the GPU usage is so sparse that it’s easy to infer what’s executing on it.

To keep things simple I decided to load a single GLTF file and measure the FCP (first contentful paint), that is, the time to the first rendered frame. Although in this particular test the first frame is not rendered until all the contents have finished loading. To estimate how much of that time can be attributed to the GLTF loading, I also measured the FCP of the same scene without the GLTF file and subtract that. Results are somewhat noisy specially on mobile, so I run them at least 3 times and pick the best. I also let the devices cool down in between tests to prevent throttling.

I’ve tested a bunch of different models, but the numbers here correspond to the SciFiHelmet GLTF example from the Khronos GLTF Sample Models. The original textures are PNGs and to produce the other variants I’ve used gltf-transform with the default texture compression settings agglomerating the results in a single GLB file.

SciFiHelmet contains a single material with 4 2048×2048 textures for albedo, normal, roughness-metalness (RM), and occlusion. When using Spark these are encoded using a 16 byte per block format available for the albedo, normal and RM maps, and an 8 byte per block format for the occlussion. Format selection is done automatically based on the channel usage and available formats on the device. It is be possible to target lower quality (8 byte per block) formats, but I’m not doing that in this test.

I gathered these results on a MacBook Pro M4:

SciFiHelmetTime (ms)Texture Size (MB)VRAM (MB)
PNG12126.6485.33
PNG+Spark10126.6418.66
WebP681.4485.33
WebP+Spark551.4418.66
AVIF600.9285.33
AVIF+Spark530.9218.66
Basis UASTC17012.9622.36
Basis ETC1S1131.9111.2

As noted before, on this machine the Spark texture loader appears to be faster than the default texture loader.

I also run this test in some mobile devices:

With the following results (all timings in milliseconds):

SciFiHelmetGalaxy S23Pixel 8Pixel 7
AVIF673690688
AVIF+Spark736653811
Basis UASTC16072920crash!
Basis ETC1S112011201077

I was curious to see how the Spark texture loaders compared to the KTX loader using textures encoded with Basis. Basis uses an intermediate representation that can be transcoded into any GPU texture format, but the transcoding runs on the CPU. While it’s much simpler than a full encoder, it appears to add a very significant overhead. Another issue is that Basis textures don’t compress nearly as well and the cost of loading these from disk is much higher that I anticipated.

Here’s the cost of the Basis transcoder on some devices using 4 worker threads with nearly 100% utilization:

MacBook Pro M4Galaxy S23Pixel 8
Basis UASTC123 ms358 ms325 ms
Basis ETC1S83 ms170 ms175 ms

Other Performance Observations

It’s fairly common to use a single GLB file that embeds geometry and textures in a single file. While this is convenient for distribution and it may be appealing to reduce the number of connections, there’s one major disadvantage: A GLB file needs to be downloaded completely before transcoding can start.

I think it may be beneficial to use GLTF/GLB files that reference external texture assets. Loading these require additional connections and an indirection that may add latency, but as soon as the first image file is received we can start processing it in concurrently with the remaining downloads. Maybe this will the subject of an upcoming benchmark.

Compression ratios make a huge difference when downloading the files over the network, but I did not expect the loading times to make as big of a difference in my tests, because I’m running them locally and the assets are always cached on disk by the browser. I thought the reduced asset sizes would mainly have a secondary effect. With higher compression ratios, you can hold more assets in a fixed size cache, making cache hits more likely.

However, the impact of asset sizes is much more direct. Disk loading times on mobile appear to be much longer than I expected, or perhaps the browser is adding some other overheads I’m not aware of.

Here are the disk loading times for two different GLB files on different devices:

GLTF SizeMacBook Pro M4Galaxy S23Pixel 8
SciFiHelmet-avif.glb4.4 MB26.5 ms191 ms192 ms
SciFiHelmet-uastc.glb16 MB41 ms795 ms629 ms

Feedback and Acknowledgments

If you’d like to try this out, the spark.js GitHub repository now includes two three.js examples. They should be easy to run, and I’d really appreciate your feedback.

Getting this up and running so quickly wouldn’t have been possible without the work from Don McCurdy, the patience from my college friend Tulo, and the incredible work of all the volunteers advancing web frameworks and standards. In recognition of this, I’ll be donating 10% of the spark.js sales to developers working on adjacent projects that push 3D graphics forward on the Web.

Leave a Comment

Your email address will not be published. Required fields are marked *