I finally got around to write some comments on this years Advances in Real Time Rendering held at SIGGRAPH 2011. Thanks to the RTR-team for making the notes available. The talk about physically-based shading in Call Of Duty has already been mentioned in my previous post. So, in no particular order:

# Rendering in *Cars 2*

Christopher Hall, Robert Hall, David Edwards (AVALANCHE Software)

At one point, the talk about rendering in Cars 2 describes how they use pre-exposed colors as shader inputs to avoid precision issues when doing the exposure after the image has been rendered. I have employed pre-exposed colors with dynamic exposure in the past, and I found them tricky to use. Since there is a delay in the exposure feedback (you must know the exposure of the previous frame to feed the colors for the next frame) you can even get exposure oscillation!

Two uses of Voxels in LittleBigPlanet2’s graphics engine

Alex Evans and Anton Kirczenow (MediaMolecule)

Lots of intelligent ideas. Unfortunately, most of them only really work for the special case of the 2.5 dimensional world that LBP plays in. The best takeaway for generalization I think is the dynamic AO.

# Making Game Worlds from Polygon Soup

Hao Chen, Ari Silvennoinen, Natalya Tatarchuk (BungiE)

This talk is about the spatial database organization for HALO Reach, and the structure that they settled for is the polygon soup. All I can say is +1! An unstructured polygon soup/object soup with a spatial index on top of it, for example, a loose octree [1], connected to a portal system is imho *the* way to go to organize 3D data for a dynamic world. I did exactly this in the past and it served me well [3]. Dynamic occlusion culling in Greene/Zhang-style [2,6] can be added on top of it very easily, and in effect, this is what the Umbra middleware provides with software rendering. I like the idea of the automatic portalization that is presented in the talk. Manual portalization is indeed cumbersome and error-prone for the artists, as they correctly state with their outdoor examples. Their solution is similar to the way navigation meshes are built, via a flood-fill of a uniform grid and cell aggregation.

# Secrets of CryENGINE 3 Graphics Technology

Tiago Sousa, Nickolay Kasyan, and Nicolas Schulz (Crytek)

A nice summary of all the little details that when combined, make a great renderer. Interestingly, this is another case for dynamic occlusion culling via readback of the z‑buffer [4]. Do not pass go, do not use occlusion queries! Occlusion queries are a misnomer, because they’re the least usable for what they were invented for. When I did this, the latency incurred by the z‑buffer readback was unnoticable on a GeForce2. I did a simple glReadPixels(GL_DEPTH_COMPONENT) over a small viewport of the framebuffer where the occlusion geometry was rendered, and it was not slowing things down. Today you would have at least some latency, but once the z information is available on the CPU, it can be used for nice things like determining the number of shadowmap cascades to cover the visible depth range.

Another thing of note is how the authors describe a technique for blurring alpha-test geometry for hair. It looks very nice but also seems to be a bit expensive. For Velvet Assassin, we used something that I named ‘dual-blend’. Just use alpha test for the solid parts of the texture, and in a second pass, alpha-blend (with an inverted alpha test) just for the pixels that have an intermediate alpha value, while the artists made sure the ordering of the triangles is from inner to outer.

# More Performance!

John White (EA Black Box), Colin Barre-Brisebois (EA Montreal)

The most densely packed and inspiring presentation of them all, with most of the topics completely unique and new. There is *Separable Bokeh Depth-of-Field*, a nice technique to make a separable blur of uniform hexagonal shape with as few passes as necessary. I think it would make sense to combine Bokeh and bloom with FFT. This is because the correct shape of the bloom kernel is the FFT of the camera aperture (at least in the far-field approximation). So in this case, bloom would have to be a 6‑pointed star.

Then they describe a cool trick,* Hi‑Z / Z‑Cull Reverse Reload*. Remember how they told you not to reverse the depth-comparison during a frame, because that would invalidate Hi‑Z? This is how to do it anyway. Also notable are *Chroma Sub-Sampled Image Processing*, and how they implemented *Tile-Based Deferred Shading* [5].

# Dynamic lighting in *God of War 3*

Vassily Filippov (Sony Santa Monica)

### This talk is mostly about aggregating multiple lights in the vertex shader, so the pixel shader only has to calculate a single specular highlight. It then goes to great lengths to make sure the result looks reasonable.

This is interesting, because I did something similar in the past, and I would like to elaborate more on it. I can’t remember that I had the numerous problems mentioned in the talk. However, I did the diffuse lighting entirely per vertex which may explain the difference.The data that I interpolated across triangles was essentially this:

half3 HemiDiffuse : COLOR0; // Diffuse hemisphere lighting half3 HemiSpecular : COLOR1; // Specular hemisphere lighting half3 SunDiffuse : TEXCOORD0; // Diffuse lighting for the sun half3 SunH : TEXCOORD1; // Half-angle vector for the sun half3 PointsDiffuse : TEXCOORD2; // Agg. diffuse lighting from points half3 PointsH : TEXCOORD3; // Agg. half-angle vector from points half3 N : TEXCOORD4; // Surface normal |

The pixel shader did nothing more than calculate the shapes of the specular highlights and multiply it with the appropriate colors. The specular power and the normalization factor was a fixed constant.

// pixel shader: half2 highlights; highlights.x = pow( saturate( dot( N, SunH ) ), SPEC_POWER ); highlights.y = pow( saturate( dot( N, PointsH ) ), SPEC_POWER ); highlights *= SPEC_NORMFACTOR; half3 result = 0; result += DiffTexture * ( In.HemiDiffuse + In.SunDiffuse + In.PointDiffuse ); result += SpecTexture * ( In.HemiSpecular + In.SunDiffuse * highlights.x + In.PointsDiffuse * highlights.y ); |

Now on to the interesting thing, the actual aggregation. The key was to accumulate the colors weighted by cosine term and attenuation, but to accumulate the direction weighted by attenuation only.

// vertex shader: float3 PointsL = 0; for( int i = 0; i < NUM_POINTS; ++i ) { half3 dx = LightPosition[i] - WorldPosition; half attenuation = // some function of distance half3 attenL = normalize( dx ) * attenuation; Out.PointDiffuse += saturate( dot( N, attenL ) ) * LightColor[i]; PointsL += attenL; } Out.PointsH = normalize( PointsL ) + V; |

The sum of the weighted L‑vectors is then converted to a half-angle vector and interpolated across the polygon. This works because the shader does not do per-pixel diffuse, and instead, the term is applied before summation in the vertex shader. Otherwise, if the pixel shader would need to do , it would no longer be possible to use the cosine-weighted sum of light colors, giving rise to all the problems mentioned in the talk.

[1] Thatcher Ulrich, “Notes on spatial partitioning”

http://tulrich.com/geekstuff/partitioning.html

[2] Hansong Zhang, “Effective Occlusion Culling for the Interactive Display of Arbitrary Models”, UNC-Chapel Hill

http://www.cs.unc.edu/~zhangh/hom.html

[3] Christian Schüler, “Building a Dynamic Lighting Engine for Velvet Assassin”, GDC Europe

http://www.gdcvault.com/free/gdc-europe-09

[4] Stephen Hill, Daniel Colin, “Practical, Dynamic Visibility for Games”, GPU Pro 2

http://gpupro2.blogspot.com/

[5] Andrew Lauritzen, “Deferred Rendering for Current and Future Rendering Pipelines”

http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines/

[6] Ned Greene, “Hierarchical Z‑Buffer Visibility”

http://www.cs.princeton.edu/courses/archive/spr01/cs598b/papers/papers.html

Hi,

I’d be very interested to know more about combining the bloom and bokeh with FFT. Do you have any references on this?

Hallo John,

what I meant was that the shape of the bloom and bokeh kernels are related to one another by the Fourier transform, which is the far-field approximation to diffraction. So, say, if you were to bloom an image with FFT, you obviously need the FFT-image of the bloom kernel. By law of physics, then, this should automatically be viewed as the non-FFT bokeh kernel (and vice versa). I remember some authors wrote a paper on this topic around 2005 (+/-), for automatic generation of bloom kernels from aperture geometry, but I can’t find it anymore. Here is a talk on the same subject: http://nae-lab.org/~kaki/paper/PG2004/Kakimoto2004GlarePresen.pdf