Battlefield 5 has shipped on PC, accompanied by our first look at a revolution in gaming graphics - real-time ray tracing via Nvidia's new RTX line of GPUs. It's a watershed moment in many ways and a phenomenal technological achievement - not just from the RTX hardware that makes it possible, but also from the engineers at DICE who committed to ray tracing in all of its shiny, real-time reflection glory. But alongside the revolution in visuals is the reality of the implementation - this is an alpha patch running on first-gen hardware. Real-time ray tracing remains massively expensive from a computational perspective, performance isn't completely ideal - but this is emergent tech, optimisations are coming, and having spoken to DICE directly, we know what kind of strategies the developer is pursuing to push frame-rates higher.
In fact, at the end of our analysis piece, you'll find our in-depth interview with DICE rendering engineer Yasin Uludağ, who has been working with colleague Johannes Deligiannis for the last year on implementing ray tracing within Battlefield 5. First up though, it's worth taking a look at the Battlefield 5 PC tech analysis video embedded below - principally to get a look at the game running in real-time in its day one incarnation and to get a sense of how ray tracing scales across the four available presets: low, medium, high and ultra. DICE's recommendation right now is to run the DXR setting at low for performance reasons, and this still looks great. But what actually happens to the quality of ray tracing as you move down the various settings?
The medium setting is where the biggest compromises to ray tracing quality begin to become evident. The roughness cut-off of material receiving ray traced reflections is raised, resulting in duller materials, painted metals or wood surfaces receiving cubemap textures instead of ray traced reflection. Generally, the quality still holds up, though it's just a little sad to see the view weapon losing the immediate surroundings' colours and tones disappear. Another hit comes from the resolution of the reflections themselves. Battlefield 5 shoots out a variable amount of rays by binning and culling the ray count based on dividing the screen into 16x16 pixel boxes. If an area needs fewer rays, it reduces the size of the box, but on the other hand, if the entire screen is filled with reflective water, it places a limit proportionate to resolution.
Ultra is at 40 per cent resolution, high at 31.6 per cent, medium at 23.3 per cent and low at 15.5 per cent. So, the clarity of reflections reduces as you go down the settings chain but just to stress again, even the low setting is still giving you a proper ray traced experience, with the most important surfaces like water, mirrors and polished metals reacting as they should to the surrounding environments.
There are plenty of Battlefield 5 DXR performance benchmarks out there right now, and some of the numbers look low - but revised code is forthcoming that addresses a number of issues that should address the most egregious frame-rate drops. For example, all levels right now are affected by a bounding box bug making ray tracing more expensive than it should be due to the existence of destructible terrain. Certain 'fake' god ray effects or a certain type of foliage can also impact performance negatively, sending out far more rays than they should. It's difficult to get a lock on how much performance is hit by using DXR, as the computational load changes according to content - there is no flat cost here.
On an RTX 2080 Ti, levels primarily based on sand or snow can run ray tracing at the low setting at 60fps at 1620p resolution, where more reflection-heavy maps like Rotterdam require a 1296p pixel-count to remain locked at the target 60 frames per second. We used the game's internal resolution scaler on a 4K screen to make the necessary adjustments here.
Obviously the improvement to image quality will, again, vary by content. On maps that are just dust or stone, the low and medium settings will only see ray tracing make a difference on the most reflective metals or glass sheets, or the occasional roadside puddle. It's only at the higher settings where ray tracing makes a difference here, working subtly on even dull materials. Maps like Rotterdam can deliver a night and day improvement - but again, it's all scene dependent, with the improvement gauged against how well the usual 'faked' techniques hold up. One of my personal favourite little touches ray tracing delivers is a reflection of the player character's face within the glass of the view weapon scope.
As things stand right now, the DICE developers responsible for the DXR implementation see it as a work-in-progress. Further optimisations are due, both in an imminent patch and also down the road as the title receives further support in the coming months. Even Nvidia driver updates are expected to deliver further boosts to frame-rates, such as the ability to run ray tracing compute shaders in parallel. Expect to see more granularity added to the DXR settings, perhaps with a focus on culling distance and LODs. Other quality and performance improvements in development include a hybrid rendering system that uses traditional screen-space reflections where the effect is accurate, only using ray tracing where the technique fails (remember, SSR can only produce reflections of elements rendered on-screen, while full ray tracing reflects anything and everything accurately, within the bounds set by the developer). This should boost performance hopefully improve some of the pop-in issues RT reflections occasionally exhibit right now.
It's also interesting to stack up the various versions of Battlefield 5 - specifically, the PC ultra experience, DXR and what we'd assume is the best console delivery on Xbox One X. There's no denying that the title offers a big boost on PC compared to the console editions of the game. Based on a detailed look at the various facets of the game, the Xbox release essentially delivers an experience equivalent to PC at medium settings, with the undergrowth setting more akin to PC's high. There are no screen-space reflections at all on the X, so in that sense, PC offers a quality advantage in reflectivity even before DXR is added to the equation. It still looks good on consoles though, and medium settings is a good place to start if you're running a more modest PC.
But it's the arrival of full real-time ray tracing here that is a big deal, comparable in many ways to prior revolutions in PC graphics rendering, such as the arrival of Crysis back in 2008, or the debut of id software's Quake back in 1996. And it's in those comparisons where the performance implications of ray tracing finds some parallels - the bottom line is that genuine, generational leaps in visual fidelity always had some kind of cost to frame-rate. Quake's immense system requirements for the time practically demanded a Pentium CPU upgrade for a playable experience, while the fully tricked out Crysis struggled to sustain 30fps at 1024x768 or 1280x1024 on even the most powerful GPU of the time. The extent to which DICE can improve performance remains to be seen, of course, but 1296p minimum on RTX 2080 Ti for 60fps action is a clear improvement over what we saw at Gamescom - and the developer is optimistic of further boosts, several of which are already complete and ready to rolled out in the next update. Performance right now is a moving target then, but the impact is clear - this is the beginning of something very special.
Battlefield 5 DXR ray tracing: the DICE tech interview
This one's for the hardcore! With the arrival of DXR and our first look at a video game with real-time, hardware-accelerated ray tracing, we're moving into unknown territory here, discussing technology and techniques never seen before in a shipping game. There's been plenty of discussion about this early, initial work with ray tracing since the DXR patch for Battlefield 5 launched, and some criticism of the performance hit. In putting our coverage together, we wanted to understand the challenges faced by the developer, how its ray tracing implementation actually works and to get some idea of the behind-the-scenes work happening right now to improve game performance. And all of this starts by understanding what the four DXR quality presets actually do, and where the quality trades are made.
What are the real differences between low, medium, high, and ultra DXR settings?
Yasin Uludağ: Right now the differences are:
- Low: 0.9 smoothness cut-off and 15.0 per cent of screen resolution as maximum ray count.
- Med: 0.9 smoothness cut-off and 23.3 per cent of screen resolution as maximum ray count.
- High: 0.5 smoothness cut-off and 31.6 per cent of screen resolution as maximum ray count.
- Ultra: 0.5 smoothness cut-off and 40.0 per cent of screen resolution as maximum ray count.
[Note: The cut-off controls which surface materials are assigned ray traced reflections in the game world. Materials are either rough (wood, rocks) or smooth (metal/glass). Based upon how smooth and shiny they are (or conversely how rough) they are able to receive ray traced reflections. The point at which the reflection on a surface transitions from a traditional cube map reflection into a ray traced reflection is then dictated by the threshold setting chosen for this. A 0.9 roughness cut off is conservative and covers polished metals, glass and water. A 0.5 value covers surfaces that are even just moderately shiny at glancing view angles. The "percentage of resolution as maximum ray count" describes the maximum total percentage of the chosen screen resolution which can have a ray traced ray assigned to it at a 1:1 ratio (one ray per pixel). The amount of total possible rays shot out and the apparent clarity of reflections then goes up from low to ultra settings.]
I say maximum ray count here because we will try to distribute rays from this fixed pool onto those screen pixels that are prescribed to be reflective (based on their reflective properties) but we can never go beyond one ray per pixel in our implementation. So, if only a small percentage of the screen is reflective, we give all of those pixels one ray.
We distribute rays where we think they are needed the most and drop the ones that didn't make it. We will never go beyond the maximum ray count if your entire screen is covered in water that is reflective, instead, it will reduce the resolution on a 16x16 tile basis to accommodate. To do this it is necessary to integrate a full-screen buffer using fast on-chip memory and atomic instructions for the last remaining parts as that gives low contention at the hardware level and it's super fast.
However, there are discussions internally to change what each individual settings do; we could do more, like play with LODs and cull distances as well as perhaps some settings for the new hybrid ray tracer that is coming in the future. We are thinking hard about these settings, and looking to have higher quality there as well.
You previously talked to us about optimisations made after Gamescom - which have made their way into the current build of the game?
Yasin Uludağ: The current launch build has a ray binning optimisation that re-orders rays based on so-called super tiles (which are large 2D tiles on the screen). Each super tile re-orders the rays within them based on their direction (angular binning). This is very good for both the texture cache and instruction cache because similar rays often hit the similar triangles and execute the same shaders. On top of that, it is very good for the triangle traverser hardware (the RT core) because the rays take coherent paths while finding the closest intersection with the BVHs.
Another neat optimisation mentioned at Gamescom is how we deal with lighting performance. There are ways to use the built-in acceleration structures in DXR where you can make queries into DXR acceleration structures through ray-gen shaders but we preferred implementing it through compute for time reasons and to aid performance. We have a linked list of lights and cubemaps on the GPU in a grid-like acceleration structure - so there is a separate grid for non shadow lights, shadow casting lights, box cubemaps etc. These are the cubemaps applied inside the reflections. This grid is also camera aligned - this is faster as it grabs the nearest lights rapidly. Without this, the lighting was slow because it had to 'walk over' all the lights to guarantee no popping.
We use Nvidia intrinsics in almost every single compute shader that surrounds and manages ray tracing. Without the Nvidia intrinsics our shaders would be running slower. Another optimisation is partially exposed to the user with the quality settings we mentioned. We call this optimisation “variable rate ray tracing”. As mentioned, the ray tracer is deciding based upon a 16x16 tile how many rays we should have in that region. This can go all the way from 256 rays down to four rays. The deciding factor is the BRDF reflectance, how much is diffuse, how much is specular, if the surface in shadow or in sunlight, what is the smoothness of the reflection, etc. We are basically trying to be smart about where we place the rays with compute shaders and how many of them to place and where. We are working on further improving this part as well currently. This should not be confused with the variable rate shading that Nvidia announced.
What are planned optimisations for the future?
Yasin Uludağ: One of the optimisations that is built into the BVHs are our use of “overlapped” compute - multiple compute shaders running in parallel. This is not the same thing as async compute or simultaneous compute. It just means you can run multiple compute shaders in parallel. However, there is an implicit barrier injected by the driver that prevents these shaders running in parallel when we record our command lists in parallel for BVH building. This will be fixed in the future and we can expect quite a bit of performance here since it removes sync points and wait-for-idles on the GPU.
We also plan on running BVH building using simultaneous compute during the G-Buffer generation phase, allowing ray tracing to start much earlier in the frame, and the G-Buffer pass. Nsight traces shows that this can be a big benefit. This will be done in the future.
Another optimisation we have in the pipe and that almost made launch was a hybrid ray trace/ray march system. This hybrid ray marcher creates a mip map on the entire depth buffer using a MIN filter. This means that every level takes the closest depth in 2x2 regions and keeps going all the way to the lowest mip map. Because this uses a so-called min filter, you know you can skip an entire region on the screen while traversing.
With this, ray binning then accelerates the hybrid ray traverser tremendously because rays are fetched from the same pixels down the same mip map thereby having super efficient cache utilisation. If your ray gets stuck behind objects as you find in classic screen-space reflections, this system then promotes the ray to become a ray trace/world space ray and continue from the failure point. We also get quality wins here as decals and grass strands will now be in reflections.
We have optimised the denoiser as well so it runs faster and we are also working on optimisations for our compute passes and filters that run throughout the ray tracing implementation.
We have applied for presenting our work/tech at GDC, so look out for that!
What are the current bottlenecks in the ray tracing implementation?
Yasin Uludağ: We have a few bugs in the launch build which prevent us from utilising the hardware efficiently such as the bounding boxes expanding insanely far due to some feature implemented for the rasteriser that didn't play well with ray tracing. We only noticed this when it was too late. Basically, whenever an object has a feature for turning certain parts on and off, the turned-off parts would be skinned by our compute shader skinning system for ray tracing exactly like the vertex shader would do for the rasteriser. (Remember we have shader graphs and we convert every single vertex shader automatically to compute and every pixel shader to a hit shader, if the pixel shader has alpha testing, we also make a any hit shader that can call IgnoreHit() instead of the clip() instruction that alpha testing would do). The same problem also happens with destructible objects because that system collapses vertices too.
Following the API specifications, if you instead of collapsing them to (0, 0, 0), collapse them to (NaN, NaN, NaN) the triangle will be omitted because it's “not a number”. This is what we did and it gave a lot of perf. This has bug has been fixed and will be shipping soon and we can expect every game level and map to see large, significant performance improvements.
Another problem we are having currently in the launch build is with alpha tested geometry like vegetation. If you turn off every single alpha tested object suddenly ray tracing is blazingly fast when it only is for opaque surfaces. Opaque-only ray tracing is also that much faster since we are binning rays as diverging rays can still cost a lot. We are looking into optimisations for any hit shaders to speed this up. We also had a bug that spawned rays off the leaves of vegetation, trees and the like. This compounded with the aforementioned bounding box stretching issue, where rays were trying to escape OUT while checking for self intersections of the tree and leaves. This caused a great performance dip. This has been fixed and significantly improves performance.
We are also looking into reducing the LOD levels for alpha tested geometry like trees and vegetation and we are also looking at reducing memory utilisation by the alpha shaders like vertex attribute fetching (using our compute input assembler). All in all, it is too early to say where we are bottlenecking on the GPU as a whole. First, we need to fix all of our bugs and the known issues (like the aforementioned from alpha testing problem and bounding box issue among others). Once we get things together with all of our optimisations, then we can look at bottlenecks on the GPU itself and start talking about them.
How are you getting to the bottom of performance problems?
Yasin Uludağ: We were initially negatively affected in our QA testing and distributed performance testing due to the RS5 Windows update being delayed. But we have received a custom compiler from Nvidia for the shader that allow us to inject a “counter” into the shader that tracks cycles spent inside a TraceRay call per pixel. This allows us to narrow down where the performance drops are coming from, we can change to primary ray mode instead of reflection rays to see which objects are “bright”. We map high cycle counters to bright and low cycle counters to dark and then go in to fix those geometries. The trees and vegetation instantly popped out as being super-bright.
Having these metrics by default in D3D12 would be a great benefit, as they currently are not. We would also love to see other exposed metrics for how 'good' a “BVH” REFIT was - ie. if the BVH has deteriorated from multiple refits and if we need to rebuild it. Characters running around can deteriorate rather fast!
In playing the game, looking at the order of complexity involved, the visuals, etc. we cannot help but recall other upheavals like Crysis, Quake, or the introduction of the pixel shader. Those took time to get to be more performant, is DXR/RTX going a similar path?
Yasin Uludağ: Yes! People can expect us to keep improving our ray tracing as time goes, as both we at DICE and Nvidia have a bunch of optimisations coming in from the engine side and driver side and we are far from done. We have specialists from Nvidia and DICE working on our issues as we speak. From now on, it's only going to get better, and we have more data now too since the game released. By the time people read this, many of the improvements mentioned will already have been completed. As you mention Quake and Crysis - Working on ray tracing and being the first out with it in this way is a privilege. We feel super-lucky to be part of this transition in the industry and we will do everything we can do deliver the best experience possible. Rest assured, our passion for ray tracing is burning hot!