NVTT quote

I love this quote from Ivan-Assen Ivanov about the NVIDIA Texture Tools:

We switched to CUDA nvcompress in our pipeline and totally forgot
about it. Until yesterday, when an artist had to work on a 7600 for a
while, and raised hell about how slow the export has become.


Standing Desk 2

So, I finally bought the Ikea standing desk and I’ve been using it full time during this week.

As many predicted my feet hurt, and at the end of the day I feel extremely tired, almost as if I had been hiking, which is actually quite a nice feeling. For once I now get to sleep before midnight and wake up early in the mornings. I’ve realized I have to do stretching exercises to prevent cramps during the night. I suppose my body will end up getting used to it.

My doctor has finally prescribed me some stronger drugs. I’m now taking Prednisone for the inflammation and Vicodin for the pain. During the first few days the Vicadin eliminated the pain entirely; after weeks of constant pain with some acute episodes that was quite a relief. I felt energized and euphoric, suddenly I could walk normally, I could even run, jump, speed up my bike, and get some adrenaline flowing!

Unfortunately the body creates tolerance fairly quickly, so I had to reduce the dose. However, I think the inflammation is finally receding, the pain is still there, but it’s just a mere discomfort. I think the results of the physiotherapy are also starting to show. The S shape of my column has been corrected, it feels much more straight now. My abs are stronger than ever. I’m not sure it has been of any help for the pain, but at least it will hopefully prevent more damage in the future.

NVIDIA Texture Tools 2.0.6

Yesterday I released another revision of the stable branch of the NVIDIA Texture Tools. I recommend everybody to upgrade, since it fixes some bugs and artifacts, and improves compatibility with current and future CUDA drivers. Starting with this release I’m planning to provide a verbose description of the changes in each release. So, here it goes:

In a multi-GPU environment NVTT will now use the fastest GPU available. This is quite useful on systems that have an embedded and a discrete GPU. You certainly don’t want to use the embedded one while the discrete GPU is idle.

Using recent CUDA runtimes with older CUDA drivers usually causes problems. To avoid that, NVTT now determines the version of the available CUDA driver and makes sure it’s compatible with the CUDA runtime that was used to compile it.

The NVTT shutdown code did not destroy the CUDA context properly, which caused problems the second time you tried to create a context in the same thread. The behavior on 2.0 drivers was to reuse the context already created, but CUDA 2.1 produces an error instead. This is fixed now.

Note however, that you are not allowed to create multiple compressors in the same thread simultaneously. This is a limitation in the CUDA runtime API that transparently creates a CUDA context per thread. In the future I’m planning to use the driver API directly, which should remove this restriction.

There have been issues reported with the CUDA compressor not producing correct output when processing 1×1 mipmaps. I was able to reproduce the problem on old drivers, but it seems to be fixed now. In any case, I realized that using the entire GPU to compress a 1×1 mipmap did not make any sense, so now I’m compressing small mipmaps in the CPU instead. That is slightly faster and avoids the aforementioned problem.

Per request of the Wolfire guys, NVTT now scales images with alpha by weighting the color channels by the alpha component. While a simple workaround to obtain the right results was to use premultiplied alpha, there was no reason not to do the right thing in all cases.

As mentioned in my previous post, I’ve updated the single color compression tables to take into account the possible hardware decoding errors.

There was also a bug in the decompression of some RGB DDS files that is now fixed.

Progress on the 2.1 release is slow, but steady. The target features are almost complete, but I’ve been cleaning things up, removing a lot of cruft, and simplifying the code, in order to make it more accessible and encourage more contributions. I’m also working in a lower level API that should be more flexible and easier to maintain. More on that in a future post.

GPU DXT Decompression

Even thought the DXT compression format is widely adopted, most of the hardware implementations use a slightly different decompression algorithm.

The problem is that there never was a specification that described the preferred implementation. The DXT format was first documented by S3 in US patent 5956431. However, in order to cover all possible implementations the description in the patent is very generic. It does not specify when or whether the colors are bit expanded, or how the palette entries are interpolated. In one of the proposed embodiments they approximate the 1/3 and 2/3 terms as 3/8 and 5/8, which would have been a good idea, if everybody had chosen the same approximation. However, different implementations approximated the expensive division in different ways.

Continue reading →

Ham and Beans

I’ve been thinking about writing about cooking in this blog for a while. Not just to share what I’ve learned, because, well, I don’t have much to share, but to get other people’s ideas and suggestions. While I may cook better than the average American and may be able to impress my wife and friends, I’m not really such a good cook. I know how to follow recipes, and I’m starting to get some sense of what works and what does not, but I have a lot to learn, which is actually fascinating; there’s a whole world of flavors out there and exploring it is a very rewarding experience.

It took me a while to get used to cook in the US. Spanish cooking is not very sophisticated, but it’s based on seasonal dishes and the use of high quality ingredients available locally. Not the kind of thing you find in the US at your nearest supermarket.

This is all changing very quickly, but many of the traditions still remain. I for one grew up without paying to much attention to food. I just thought it was fuel for my body. It was only when I moved to the US that I started to appreciate what I used to have and now missed.

The Mediterranean diet is mostly composed of olive oil, bread, fresh fruits, fish and vegetables. However, I grew up in a family that runs a pork farm and a small meat production and distribution business. So, I had a larger share of meat than the average. Despite of that, meat was most often used as a condiment or in small dishes, except on special occasions.

[missing image – Iberian pigs in the Dehesa]

Our pork is unlike any other. Most of it is free-range and acorn-fed, which gives its meat a very distinct flavor. It is generally destined to the production of the finest quality deli meats: cured ham, loins, and sausages.

As far as I know it’s not possible to find anything like that in the US. Exports of Iberian ham are legal now, but the FDA requires installations to be periodically inspected and approved. Small farmers cannot really afford to comply with all the requirements of the FDA, so the only ham that you can usually find in the US is from larger and lower quality producers.

To add more insult to injury, the low quality ham that is available here is sold at outrageous prices, as if it were a high quality product! If you are not discouraged yet, you can find some of it at the Spanish Table, or at online retailers such as La Tienda.

I would not recommend any of those options. Instead, what I do is to bring some of our ham with us every time we travel to Spain. This is actually illegal, but is usually safe. However, once we tried to bring an entire leg and, much to our dismay, it was confiscated by the border officers. Since then, I only bring small quantities that are hardly noticeable on the scanner.

Rather than eating all of it right away, I generally save it for months, waiting for the right opportunity to enjoy it. Last weekend that opportunity presented itself in the form of fresh fava beans:

Fava beans are delicious eaten raw, or simply sautéed in olive oil with red onions. However, a little bit of Iberian Ham turns that delicious dish into a delicacy.

The preparation is trivial:

  • Remove the beans from their pods. It’s possible to also remove the beans from the shells by cooking them in boiling water for a short period of time and cooling them quickly in cold water. I personally don’t bother with that and leave the beans in their shells.
  • Pour a small amount of olive oil on a sauté pan on high heat. Chop the onions and the ham, and cook until flagrant.
  • Then add the beans, season to taste with salt and freshly ground pepper, and cook lightly for a few minutes while stirring often.

Let me know if you like posts like this, and if so, in future posts I’ll continue rambling about my quest to adapt the traditional Spanish recipes to the resources available in the US.


Posted 18/5/2009 at 4:04 am | Permalink
Habas con jamón…. Yummy!

Did I tell you I bought a bread machine? I plan to make sourdough bread as soon as I’m able to grow the culture.

Posted 18/5/2009 at 8:07 am | Permalink
Yeah, fantastic post, more!

Posted 18/5/2009 at 2:54 pm | Permalink
¡Habas con jamón! ¡Rico, rico!
Well done! I have the same problem finding cured ham in England, and I also follow the same procedure to bring some here, although I haven’t tried yet to bring a whole leg :D

Chad Austin
Posted 18/5/2009 at 4:46 pm | Permalink
I love all food, so more!

Posted 18/5/2009 at 7:02 pm | Permalink
Thanks for all the feedback! I think that next I’ll share my technique to make a basic pâté.

Posted 19/5/2009 at 8:47 pm | Permalink
Great post, thanks! BTW, I like your typo “cook until flagrant”, it makes the meal seem much more exotic :) If you ever find a source for good quality hams please post it!

Optimal Grid Rendering

This is a trick that I learned from Matthias Wloka during my job interview at NVIDIA. I thought I had a good understanding of the behavior of the vertex post-transform cache, but embarrassingly it turned out it wasn’t good enough. I’m sure many people don’t know this either, so here it goes.

Rendering regular grids of triangles is common enough to make it worth spending some time thinking about how to do it most efficiently. They are used to render terrains, water effects, curved surfaces, and in general any regularly tessellated object.

It’s possible to simulate the native hardware tessellation by rendering a single grid multiple times, and the fastest way of doing that is using instancing. That idea was first proposed in Generic Mesh Refinement on GPU and at NVIDIA we also have examples that show how to do that in OpenGL and Direct3D.

That’s enough for the motivation. Imagine we have a 4×4 grid. The first two rows would look like this:

* - * - * - * - *
| / | / | / | / |
* - * - * - * - *
| / | / | / | / |
* - * - * - * - *

With a vertex cache with 8 entries, the location of the vertices after rendering the first 6 triangles of the first row should be as follows:

7 - 5 - 3 - 1 - *
| / | / | / | / |
6 - 4 - 2 - 0 - *
| / | / | / | / |
* - * - * - * - *

And after the next two triangles:

* - 7 - 5 - 3 - 1
| / | / | / | / |
* - 6 - 4 - 2 - 0
| / | / | / | / |
* - * - * - * - *

Notice that the first two vertices are no longer in the cache. As we proceed to the next two triangles two of the vertices that were previously in the cache need to be loaded again:

* - * - * - 7 - 5
| / | / | / | / |
3 - 1 - * - 6 - 4
| / | / | / | / |
2 - 0 - * - * - *

Instead of using the straightforward traversal, it’s possible to traverse the triangles in Morton or Hilbert order, which are known to have better cache behavior. Another possibility is to feed the triangles to any of the standard mesh optimization algorithms.

All these options are better than not doing anything, but still produce results that are far from the optimal. In the table below you can see the results obtained for a 16×16 grid and with a FIFO cache with 20 entries:

Method          ACMR    ATVR
Scanline        1.062   1.882
NVTriStrip      0.818   1.450
Morton          0.719   1.273
K-Cache-Reorder	0.711   1.260
Hilbert         0.699   1.239
Forsyth         0.666   1.180
Tipsy           0.658   1.166
Optimal         0.564   1.000

Note that I’m using my own implementation for all of these methods. So, the results with the code from the original author might differ slightly.

The most important observation is that, for every row of triangles, the only vertices that are reused are the vertices that are at the bottom of the triangles, and these are the vertices that we would like to have in the cache when rendering the next row of triangles.

When traversing triangles in scanline order the cache interleaves vertices from the first and second row. However, we can avoid that by prefetching the first row of vertices:

4 - 3 - 2 - 1 - 0
| / | / | / | / |
* - * - * - * - *
|   |   |   |   |
* - * - * - * - *

That can be done issuing degenerate triangles. Once the first row of vertices is in the cache, you can continue adding the triangles in scanline order. The cool thing now is that the vertices that leave the cache are always vertices that are not going to be used anymore:

* - 7 - 6 - 5 - 4
| / | / | / | / |
3 - 2 - 1 - 0 - *
| / | / | / | / |
* - * - * - * - *

In general, the minimum cache size to render a W*W grid without transforming any vertex multiple times is W+2.

The degenerate triangles have a small overhead, so you also want to avoid them when the cache is sufficiently large to store two rows of vertices. When the cache is too small you also have to split the grid into smaller sections and apply this method to each of them. The following code accomplishes that:

void gridGen(int x0, int x1, int y0, int y1, int width, int cacheSize)
    if (x1 - x0 + 1 < cacheSize)
        if (2 * (x1 - x0) + 1 > cacheSize)
            for (int x = x0; x < x1; x++)
                indices.push_back(x + 0);
                indices.push_back(x + 0);
                indices.push_back(x + 1);

        for (int y = y0; y < y1; y++)
            for (int x = x0; x < x1; x++)
                indices.push_back((width + 1) * (y + 0) + (x + 0));
                indices.push_back((width + 1) * (y + 1) + (x + 0));
                indices.push_back((width + 1) * (y + 0) + (x + 1));

                indices.push_back((width + 1) * (y + 0) + (x + 1));
                indices.push_back((width + 1) * (y + 1) + (x + 0));
                indices.push_back((width + 1) * (y + 1) + (x + 1));
        int xm = x0 + cacheSize - 2;
        gridGen(x0, xm, y0, y1, width, cacheSize);
        gridGen(xm, x1, y0, y1, width, cacheSize);

This may not be the most optimal grid partition, but the method still performs pretty well in those cases. Here are the results for a cache with 16 entries:

Method           ACMR   ATVR
Scanline         1.062  1.882
NVTriStrip       0.775  1.374
K-Cache-Reorder  0.766  1.356
Hilbert          0.754  1.336
Morton           0.750  1.329
Tipsy            0.711  1.260
Forsyth          0.699  1.239
Optimal          0.598  1.059

And for a cache with only 12 entries:

Method           ACMR   ATVR
Scanline         1.062  1.882
NVTriStrip       0.875  1.550
Forsyth          0.859  1.522
K-Cache-Reorder  0.807  1.491
Morton           0.812  1.439
Hilbert          0.797  1.412
Tipsy            0.758  1.343
Optimal          0.600  1.062

In all cases, the proposed algorithm is significantly faster than the other approaches. In the future it would interesting to take into account some of these observations in a general mesh optimization algorithm.


I think that the Average Cache Miss Ratio (ACMR) is not the best way of measuring the efficiency of the vertex cache under different mesh optimization algorithms. The ACMR is basically the number of vertex transforms divided by the number of primitives, and greatly depends on the topology of the mesh. For a triangle mesh the theoretical optimal is 0.5, but if the mesh has boundaries, or disconnected triangles, then the optimal value is much higher. So, you have to compare the ACMR against the vertex to triangle ratio of the mesh.

I think that a better way of measuring how well a mesh optimization algorithm or a cache implementation performs is using the average transform to vertex ratio (ATVR), that is, the number of vertex transforms divided by the number of vertices. In the optimal case it’s always 1, independently of the mesh and the primitive type. I think it provides a much more intuitive a much more intuitive idea of the cost of rendering a mesh. For example, if the transform to vertex ratio is 2, it means that each vertex is transformed an average of 2 times.

Here’s another example: In his article about Vertex Cache Optimisation, Tom Forsyth has to clarify that the results for the GrannyRocks mesh (between 0.797 and 0.765) are actually quite good, because the best ACMR possible is 0.732. That clarification would not be necessary when using the metric that I propose. Instead, you would say that the transform to vertex ratio is between 1.09 and 1.05, and it’s instantly obvious that it’s a good result.

Real-Time Creased Approximate Subdivision Surfaces

Denis finally put his I3D paper online!

Denis was my intern a couple of summers ago. We worked together on the implementation of the an emulation framework to implement and prototype various tessellation algorithms. Last summer he wanted to try something different, I recommended him to join a game studio, and he finally went to work at Valve.

Having spent so much time and effort working on approximate subdivision surfaces and talking about them to developers, it’s very pleasing to see these techniques being adopted in real applications.

Adding support for subdivision surfaces in a game engine requires significant changes in the production pipeline. Not only tools need to be updated, but production processes need to change, and artists need to be educated to adopt and learn the new tools.

I think this paper will be very valuable in that regard. It describes Valve’s experience integrating ACC into the Source engine. How the ACC scheme had to be extended to support creases in order to achieve the visuals that they were trying to achieve, the many problems that arose, and more importantly, the problems that could not be solved and that required artist intervention.

10 Fun Things to do with Tessellation

Hardware tessellation is probably the most notable feature of Direct3D11.

Direct3D11 was announced at the last Gamefest and a technical preview was released in the November 2008 DirectX SDK. Hardware implementations are expected to be available this year.

[missing image]

Direct3D11 extends the Direct3D10 pipeline with three new stages: Two programmable shader stages (the Hull and Domain Shaders), and a fixed function stage (the Tessellator). More details can be found here and here.

Rendering of Catmull-Clark subdivision surfaces is often mentioned as the primary application for the tessellation pipeline, but there are many other interesting uses that have not received that much attention.

I thought it would be interesting to take a closer look at those other applications, and submitted a proposal to do that at GDC’09. However, it seems that the organizers do not think tessellation is as interesting as I do, or they didn’t like my proposal, or maybe it’s just that they know I’m a lousy speaker. I will never know, because the gracious feedback of the GDC review committee can be represented by a single boolean.

In any case, here’s a brief overview of the 10 fun things that I was planning to talk about. I don’t get very deep into the technical details, but in future posts I may describe some of these applications more thoroughly. Please, leave your comments if there’s something you would like to learn more about.


Curved PN Triangles is a triangle interpolation scheme that operates directly on triangle meshes whose vertices are composed of positions and normals (PN stands for Point-Normal).

[missing image – PN Triangles]

It’s an interesting way of improving visual quality that offers a simple migration path, since assets do not need to be heavily modified.

The PN Triangle evaluation consists of two steps: First, for every triangle of the input mesh a triangular cubic patch is derived solely from the vertex positions and normals; no adjacency information is required. Then, the resulting patch is subdivided or tessellated for rendering.

The resulting surface is smoother than the polygonal surface, but does not have tangent continuity in general, and that results in shading discontinuities. To hide these discontinuities normals are interpolated independently using either linear or quadratic interpolation. These normals are not the true surface normals, but they provide a smooth appearance to the surface.

This two-step evaluation maps very well to the Direct3D11 tessellation pipeline. The evaluation of the control points can be performed in the Hull Shader, the fixed function tessellator can produce a tessellation pattern in the triangle domain, and the actual surface can be evaluated for each of the tessellated vertices in the Domain Shader.

[missing image – Scalar Tagged PN Triangles]

In order to support sharp edges a rim of small triangles is added along the edges. That increases the number of patches, and it’s not entirely clear how to properly texture map them. Scalar Tagged PN-Triangles solve that problem in a more elegant way by tagging each crease vertex with three scalar that act as shape controllers and modify the construction of the surface control points. However, this representation does not support crease corners.

Silhouette Refinement

When tessellation is enabled the only supported primitive type is the patch primitive. In Direct3D11 a patch is an abstract primitive with an arbitrary number of vertices. You can use patches to represent traditional primitives (ie. a triangle is just a patch with 3 vertices), but this also enables you to represent other input primitives with arbitrary topology and additional connectivity information.

An interesting extension of of PN-Triangle tessellation is to augment the input triangles with the neighbor vertices in order to perform silhouette refinement.

With this additional information it’s possible to compute tessellation factors in he Hull Shader based on whether an edge is on the silhouette or the interior of the mesh. Then the fixed function tessellator uses these edge tessellation factors to produce a semi-regular tessellation pattern and the Domain Shader transforms it to interpolate the surface.

Phong Tessellation

[missing image – phong tessellation]

Phong Tessellation is a geometric version of Phong interpolation, but applied to vertex positions instead of normals.

First, points are interpolated linearly over each triangle using its barycentric coordinates, then the points are projected onto the planes defined by the corner position and normal, and finally the result of the three projections is interpolated again.

This procedure produces a smooth surface comparable to PN Triangles, but its evaluation is much cheaper, since no additional control points need to be computed.

Bezier Surfaces

Curved surfaces are not only useful for characters, but also for level geometry and objects.

[missing image – Quake 3 Arena]

id Software introduced the use of quadratic Bezier patches for architectural geometry in Quake 3 Arena and has been using them ever since.

Climax Brighton’s Moto GP used cubic Bezier patches to model the motorcycles.

Bezier patches can be evaluated very efficiently, because they don’t need any information about the surrounding mesh. As these games show, tessellation hardware is not required to render these surfaces. However, hardware tessellation will allow doing it much more efficiently, and will facilitate the use of these and more complex surfaces.

Approximation to Subdivision Surfaces

Rendering of approximated Catmull-Clark subdivision surfaces is probably the most anticipated application of hardware accelerated tessellation. Several approximation methods exist.

[missing images]

Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches is the most popular one. This approximation constructs a geometry patch and a pair of tangent patches for each quadrilateral face of the control mesh. The geometry patch approximates the shape and silhouette, but does not provide tangent continuity. A smooth normal field is constructed using two additional tangent patches. The approximation supports boundaries and has also been extended to support creases in Real-Time Creased Approximate Subdivision Surfaces.

GPU Smoothing of Quad Meshes proposes an alternative approximation using piecewise quartic triangular patches that have tangent continuity and do not require additional tangent patches to provide a smooth appearance. In Fast Parallel Construction of Smooth Surfaces from Meshes with Tri/Quad/Pent Facets the same approach is extended to approximate triangular and pentagonal faces.

[missing image]
(c) Kenneth Scott, id Software

Gregory patches are a more compact representation that also provides a very similar approximation, but only support quad and triangle control faces.

The availability of sculpting tools like ZBrush and Mudbox makes it possible to create highly detailed meshes. Displaced subdivision surfaces provide a compact and efficient representation for these meshes.

Rendering Geometry Images

Another approach to render highly detailed surfaces is to use geometry images. While geometry images can be rendered very efficiently, their video memory requirements are generally higher than displacement maps due to the lack of high precision texture compression formats. Traditional animation algorithms are not possible with this representation, and view dependent tessellation level evaluation is complicated, because geometry information is not directly available at the Hull Shader stage. However, geometry images may be the fastest approach to render small static objects at fixed tessellation levels.

Terrain Rendering

Terrain rendering is one of the most obvious applications for tessellation. The flexibility of the tessellation pipeline enables the use of sophisticated algorithms to evaluate the level of refinement of the terrain patches, and frees you from having to worry about many of the implementation details.

[missing image – Saga of Ryzom]

It’s also possible to extend traditional terrain engines with arbitrary topologies. Some MMORPGs are already doing that to create more rich environments.

For example Saga of Ryzom, a game that is based on the Nevrax engine, uses cubic patches to model the terrain, which enables them to create impressive cliffs and overhangs.

[missing image – Saga of Ryzom]

Tessellation should make it possible to combine regular heightfields, with caves, cliffs, arches, and other interesting rock formations.

I think that ZBrush or Mudbox would be excellent tools to create natural looking rugged terrain.

Hair Rendering

Efficient hair rendering is one of the most interesting applications of the Direct3D11 tessellation pipeline. In addition to triangular and quad patches the fixed function tessellator can also generate lines, which are very useful for applications like hair and fur rendering.

[missing image – nalu]

The algorithm described in Hair Animation and Rendering in the Nalu Demo maps very well to the tessellation pipeline.

As shown in Real-Time Rendering of Realistic Hair, the use of the hardware tessellation pipeline makes it very easy to simulate and render realistic hair with high geometric complexity in real-time.

That’s possible, because the simulation is performed only on a few hundred guide hairs, that are expanded by the tessellator into thousands of hair strands.

Rendering Panoramas

Another application for tessellation is to perform arbitrary non linear projections, that is useful, for example, to create real-time panoramas.

Since graphics hardware relies on homogeneous linear interpolation for rasterization, arbitrary projections and deformations at the vertex level result in errors unless the surface is sufficiently refined.

[missing image – panquake]

The traditional image based approach is to render the scene to a cube map and then perform an arbitrary projection of the cubemap to screenspace relying on texture hardware to do the sampling and interpolation. This was the approach taken in Fisheye Quake and Pan Quake.

While that works well, it requires rendering the scene to the 6 cube faces, and sometimes results in oversampling or undersampling of some areas of the scene.

[missing image – panorama]

Dynamic Mesh Refinement on GPU using Geometry Shaders proposes the use of the geometry shader to dynamically refine the surfaces to prevent linear interpolation artifacts. However, the Geometry Shader operates sequentially and is not well suited for this task. On the other side, the dynamic mesh refinement algorithm maps well to the Direct3D11 tessellation pipeline.

Rendering of 2D curved shapes

While GPUs can render simple polygons, they are not able to automatically handle complex concave and curved polygons with overlaps and self intersections, without prior triangulation and tessellation.

[missing image – svg tiger]

The Direct3D11 tessellation pipeline is not designed to perform triangulation. However, there’s a well known method to render arbitrary polygons using the stencil buffer that can be used in this case. This method was first described in the OpenGL Red Book, but was recently popularized by its implementation in the Qt graphic library.

It’s possible to combine this technique with hardware tessellation to render curved tessellated shapes without the need of expensive CPU tessellation and triangulation algorithms.


Posted 11/1/2009 at 10:31 am | Permalink
Excellent post Ignacio! Great inspiration for future geometry pipelines & usages in games

Posted 11/1/2009 at 11:36 am | Permalink
Do you think you could do Reyes-style motion blur?

Similar to the Rendering Panoramas method that you mention – tesselate triangles until they are near 1 pixel size and then extrude them based on their velocity (assume the engine is passing a velocity vector per vertex or something) ?

Posted 11/1/2009 at 1:27 pm | Permalink
> Do you think you could do Reyes-style motion blur?

I think that tessellation does not help much in this setting. There’s no tessellation mode that creates multiple copies of the same triangle. However, you can do that in the geometry shader. The problem is how to combine these copies. You can use some sort of order independent transparency, which might actually be possible in Direct3D11, since pixel shaders support arbitrary gather and scatter operations.

In a reyes style render shading is performed in object space. Fast moving objects do not need to be shaded exactly, so using tessellation you can dynamically change the shading frequency based on this and other factors.

ren canjiang
Posted 7/2/2009 at 3:55 am | Permalink
thank you for so interesting topics. i’m using opengl for demos, i hope it can catch up with dx11?-?

sonali joshi
Posted 21/6/2009 at 4:17 am | Permalink
it was just ok

Posted 24/7/2009 at 10:40 am | Permalink
gud gemacht ) weiter so !

Posted 12/3/2010 at 9:39 pm | Permalink
You’re an excellent speaker, and have an interesting article. It’s a shame you didn’t get to speak! =(

Ownership-based Zippering

Since my Gamefest talk I’ve received numerous questions about the ownership-based zippering algorithm that I proposed. So, I’ll try to explain it in more detail. See my previous article on watertight texture sampling for more background info.

In the averaging method we would have to store the texture coordinate of every patch that contributes to a shared feature. Edges are shared by only two patches, but corners can be shared by many patches. By defining the ownership of the shared features (corners and edges), we only have to store the texture coordinates of the patch that owns the corresponding feature.

So, we have:

  • 4 texture coordinates for the interior (4).
  • 2 texture coordinates for each edge (8).
  • 1 texture coordinate for each corner (4).

Therefore, the total number of texture coordinates per patch is: 4+8+4 = 16

Deciding what patch owns a certain edge or corner is done as a pre-process, so that the patch texture coordinates can be computed in advance. The way I store these texture coordinates is as follows:

[missing picture]

Each vertex has:

1 interior texture coordinate. (index 0)
1 edge texture coordinate for each of the edges. (index 1 and 2)
1 corner texture coordinate. (index 3)

On the interior, we interpolate the interior texture coordinates bilinearly:

float2 tc = bar.x * texCoord[0][0] +
            bar.y * texCoord[1][0] +
            bar.z * texCoord[2][0] +
            bar.w * texCoord[3][0];

where bar stands for the barycentric coordinates:

bar.x = (    uv.x) * (    uv.y);
bar.y = (1 - uv.x) * (    uv.y);
bar.z = (1 - uv.x) * (1 - uv.y);
bar.w = (    uv.x) * (1 - uv.y);

On the edges we interpolate the edge texture coordinates linearly:

if (uv.y == 1) tc = texCoord[0][1] * bar.x + texCoord[1][2] * bar.y;
if (uv.y == 0) tc = texCoord[2][1] * bar.z + texCoord[3][2] * bar.w;
if (uv.x == 1) tc = texCoord[3][1] * bar.w + texCoord[0][2] * bar.x;
if (uv.x == 0) tc = texCoord[1][1] * bar.y + texCoord[2][2] * bar.z;

And at the corners we simply select the appropriate corner texture coordinate:

if (bar.x == 1) tc = texCoord[0][3];
if (bar.y == 1) tc = texCoord[1][3];
if (bar.z == 1) tc = texCoord[2][3];
if (bar.w == 1) tc = texCoord[3][3];

The same thing can be done more efficiently using a single bilinear interpolation preceded by some predicated assignments:

// Interior
float2 t0 = texCoord[0][0];
float2 t1 = texCoord[1][0];
float2 t2 = texCoord[2][0];
float2 t3 = texCoord[3][0];

// Edges
if (uv.y == 1) { t0 = texCoord[0][1]; t1 = texCoord[1][2]; }
if (uv.y == 0) { t2 = texCoord[2][1]; t3 = texCoord[3][2]; }
if (uv.x == 1) { t3 = texCoord[3][1]; t0 = texCoord[0][2]; }
if (uv.x == 0) { t1 = texCoord[1][1]; t2 = texCoord[2][2]; }

// Corners
if (bar.x == 1) t0 = texCoord[0][3];
if (bar.y == 1) t1 = texCoord[1][3];
if (bar.z == 1) t2 = texCoord[2][3];
if (bar.w == 1) t3 = texCoord[3][3];

float2 tc = bar.x * t0 + bar.y * t1 + bar.z * t2 + bar.w * t3;

And finally, the predicated assignments can be simplified and replaced by an index calculation as I proposed in my previous article:

// Compute texture coordinate indices (0: interior, 1,2: edges, 3: corner)
int idx0 = 2 * (uv.x == 1) + (uv.y == 1);
int idx1 = 2 * (uv.y == 1) + (uv.x == 0);
int idx2 = 2 * (uv.x == 0) + (uv.y == 0);
int idx3 = 2 * (uv.y == 0) + (uv.x == 1);

float2 tc = bar.x * texCoord[0][idx0] +
            bar.y * texCoord[1][idx1] +
            bar.z * texCoord[2][idx2] +
            bar.w * texCoord[3][idx3];

The same idea also applies to triangles:

// Interior
float2 t0 = texCoord[0][0];
float2 t1 = texCoord[1][0];
float2 t2 = texCoord[2][0];

// Edges
if (bar.x == 0) { t1 = texCoord[1][1]; t2 = texCoord[2][2]; }
if (bar.y == 0) { t2 = texCoord[2][1]; t0 = texCoord[0][2]; }
if (bar.z == 0) { t0 = texCoord[0][1]; t1 = texCoord[1][2]; }

// Corners
if (bar.x == 1) t0 = texCoord[0][3];
if (bar.y == 1) t1 = texCoord[1][3];
if (bar.z == 1) t2 = texCoord[2][3];

float2 tc = bar.x * t0 + bar.y * t1 + bar.z * t2;

And the resulting code can be optimized the same way:

int idx0 = 2 * (bar.z == 0) + (bar.y == 0);
int idx1 = 2 * (bar.x == 0) + (bar.z == 0);
int idx2 = 2 * (bar.y == 0) + (bar.x == 0);

float2 tc = bar.x * texCoord[0][idx0] +
            bar.y * texCoord[1][idx1] +
            bar.z * texCoord[2][idx2];