Digital Foundry: Will the shareable ghost replays ever make it to the Xbox 360 game?
Sebastian Aaltonen: Yes, it's possible that we could have robust ghost racing features in the future Trials games. The same data we use to save top 5000 replays of each track can be seamlessly used for ghost racing. There is no technical limitation present.
In Trials HD we have several new features that provide online competition. Not just against one player, but against all your friends simultaneously. The friend comparison meter shows the friend currently closest to you and how much ahead/behind that friend currently is compared to you. If you get too far from the compared friend, the friend comparison meter picks a new currently closest friend for you to race against.
All the friend comparison meter data is stored to the track leaderboard. The friend time difference is basically interpolated from the checkpoint interval times, but we have stored some extra key values to the leaderboard to make the interpolation much more precise.
Digital Foundry: Can you give an overall description in layman's terms of what deferred rendering is and what its advantages and disadvantages are?
Sebastian Aaltonen: Deferred rendering means that you completely separate the lighting (and shadowing) from the scene geometry rendering. This basically means that the scene complexity doesn't increase the lighting cost, and the lighting complexity does not increase the scene rendering cost. This saves a lot of performance in complex lighting scenarios and makes performance profiling and fine-tuning easier.
And now to the more technical explanation:
Instead of rendering the final colour of each pixel in screen immediately, you first render a full screen buffer (called g-buffer) that stores the following information for each pixel: surface distance from camera, surface normal vector, surface material properties (specularity, glossiness, ambient, emissive, diffuse). After the geometry has been rendered to the g-buffer, you can add any amount of lights and post-process effects by simply rendering 2D rectangles (sprites) with appropriate shaders.
The main advantages of deferred rendering are: lower overdraw for expensive lighting pixel shaders (improved GPU performance), dramatic reduction of shader combinations (reduced memory usage, improved CPU performance and GPU performance) and reduced graphics engine and shader system complexity (reduced development time).
The main disadvantages of deferred rendering are the increased graphics memory usage (to store g-buffers), the incompatibility with hardware multi-sample anti-aliasing (MSAA) and the extra work required to support transparent surfaces that receive lighting.
Digital Foundry: There are a few definitions and methods where deferred rendering is concerned - what is Trials HD's setup?
Sebastian Aaltonen: I experimented with multiple versions of forward rendering and deferred rendering during the early stages of the project. Our latest PC title Trials 2 was using our last-generation deferred renderer, and it was using a unified lighting model with stencil shadows for each object. When I started to port that engine to Xbox 360, I switched to light index deferred rendering (LiDR), because it was a perfect match for stencil shadows. LiDR was fast and very memory effective. However, as the graphics content production really started, we noticed that stencil shadows had many problems we didn't want to face again. The stencil shadow volume rendering performance is highly erratic (camera angle and depth complexity dependent), it requires extra care on modelling (closed meshes) and the produced shadows are completely sharp and thus unrealistic.
In the end we chose to use traditional g-buffer style rendering. It was the best fit for the environments we have in this game and for the shadowing and post processing techniques we use.
Digital Foundry: How difficult was it in handling tiling, multiple render targets and 60FPS - was the 10MB of eDRAM directly connected to the 360 GPU any help?
Sebastian Aaltonen: We did not use tiling at all, as I wanted to save some performance. This design choice, however, meant that our 3x 32-bit g-buffer setup was a slightly bit too large to be rendered at once. We had to find a way to cut a few pixels somewhere to make the buffers slightly smaller. Fortunately we discovered that majority of TV sets had a little bit of overscan, so we decided to cut 20 rows of pixels inside the upper/lower TV overscan area. This way the majority of the players would never see any unused area in the screen, as their TV would show the unused area outside of the TV screen borders. After the g-buffer rendering is finished, everything (post effects, UI, etc) is rendered using the full screen area. This way we can have a pixel-perfect 720p output without any stretching and without the need to multipass the geometry.
The fast eDRAM frame buffer memory and its free anti-aliasing hardware were very helpful during the technology development and optimisation phases. Rendering to multiple render targets simultaneously is a common scenario for a deferred renderer, and MRT rendering requires lots of render target bandwidth. Our high-end particle renderer also requires lots of render target bandwidth.
eDRAM gives the system huge extra render target bandwidth for operations like these, and makes Xbox 360 a very suitable platform for deferred rendering techniques.
The anti-aliasing hardware inside the eDRAM is one of the most important performance advantages of the platform. With the anti-aliasing hardware we could speed up our soft shadowing algorithm dramatically, and we could replace lots of usually pixel shader-heavy post-processing steps (blurring and downsampling) with cheaper alternatives. These hardware specific optimisations required down-to-the-metal code, but in the end I must say that the eDRAM hardware was a key feature in making our game run at constant 60FPS.
Digital Foundry: What were your chosen render targets? 720p? FP16 or FP10? MSAA?
Sebastian Aaltonen: During the g-buffer rendering, the surface colour is stored in R8G8B8 format (eight bits for each pixel element). The final light accumulation buffer format is floating point FP10. All lighting is done in high dynamic range to the FP10 buffer.
A low resolution and low colour-precision (but high brightness range) version of the light accumulation buffer is resolved for our tone-mapping algorithm and recursively scaled down. The engine constantly calculates the average amount of light approaching the viewer eye, and simulates the eye iris opening and closing using this data. After the whole scene has been rendered to the FP10 floating-point buffer with HDR lighting and then tone-mapped the buffer is resolved to 720p R8G8B8A8 format for final post-process steps.