It's been a while since we've done one of these! While putting together our tech analysis for id software's fantastic Doom reboot, one thing became clear - this was a game handing in a remarkable level of visual fidelity while maintaining an extremely high frame-rate. The scale of the achievement here can't really be understated: we were looking at a 60fps game handing in visuals better than many, if not most, 30fps titles. Just how did they do that?
Games take so long to develop these days that many creators give talks on their techniques at the likes of GDC and Siggraph, giving us some insight into the technical underpinnings of many modern games before they've even come out - it's great for Digital Foundry in that it gives us valuable research when putting together our articles and videos. But very little was known about idTech6, its relationship with its predecessor - and indeed the make-up of the engine that powered the cancelled Doom 4.
So when the opportunity to put together a 'no holds barred' tech interview with id software came along, we grabbed it enthusiastically. In this piece, we're going deep - covering off the evolution of idTech, the core rendering principles on which the latest iteration of the game was based, the team's views on resolution, scaling and anti-aliasing - plus of course, the growing importance of asynchronous compute, and the new wave of PC graphics APIs.
And the timing of this piece is fortunate too - this week, id released the long awaited Vulkan patch for Doom, bringing with it some game-changing improvements for PC gaming and a shot in the arm for AMD Radeon's hardware in particular. Is it time for developers to move on from DirectX 11 and embrace the likes of Vulkan and DX12? You'll find out below.
Answering the questions is a veritable who's who of id software's key tech staff. Many thanks to the team for giving us so much of their time for this article.
- Robert A Duffy - Chief Technical Officer
- Billy Khan - Lead Programmer
- Tiago Sousa - Lead Rendering Programmer
- Jean Geffroy - Senior Engine Programmer
- Axel Gneiting - Senior Engine Programmer
Digital Foundry: When we look at the history of Doom, and of id software, we see a phenomenal heritage of technological excellence. What were the objectives for idTech6 and are you happy with the final results?
Robert A Duffy: The original objectives were very simple; we wanted best-in-class visuals at 60fps and the best player movement and feel for a shooter. There is obviously a whole list of smaller objectives that form the foundations of achieving those goals, but as primary consumer facing technology goals, those were it. We are very happy with the final results but we are continuing the push with console updates, Vulkan support for PC, and a host of other updates geared towards the community.
Digital Foundry: Can we get some idea of the timelines on idTech6 - did it essentially evolve in parallel with Doom development, across both the final game and the cancelled Doom 4? Or did you revamp the underlying tech completely when you targeted 60fps?
Robert A Duffy: As we were prototyping Doom gameplay and the environments started to take form, it was clear we needed to take the technology in a different direction to achieve the visual fidelity we felt a modern Doom game warranted. 60fps was always the target for the game but as we started adding full dynamic lighting, shadows, and other features the performance target became a main focus of the engineering team. The short answer is a lot changed but not everything.
Digital Foundry: Are you able to discuss the major changes between idTech5 and 6? Rage was notably a much more static lighting approach, and presumably a forward renderer. On the other hand, Doom is much richer in dynamic lights and shadowing. Is it some form of tile deferred or clustered deferred renderer?
Tiago Sousa: From the start, one of our goals for the idTech 6 renderer was to have a performant and as much unified design as possible, to allow lighting, shadowing and details to work seamlessly across different surfaces types; while keeping in mind scalability and things like consoles, MSAA/good image quality and MGPU [multi-GPU] scalability.
The current renderer is a hybrid forward and deferred renderer. With such a design we try and get the best from both worlds: the simplicity of a forward renderer and the flexibility of deferred to be able to approximate certain techniques efficiently. Another goal from the start was to improve iteration times for the art team, and things like disk space consumption. We wanted to move away from the stamping approach from idTech5 - essentially how detail was applied to textures. In the past, it relied on pre-baking texture results into mega-texture and so on - on this iteration we've translated this process into a real-time GPU approach, with no draw calls added.
As for parameterising all the input data for feeding the GPU, “Clustered Deferred and Forward Shading” from Ola Olson et al and its derivative “Practical Clustered Shading” from Emil Person caught my eye early on during the research phase due to its relative simplicity and elegance, so we expanded from that research. All volume data required for shading the world is essentially fed via a camera frustum shaped voxel structure, where all such volumes are registered. It allows for a fairly generous amount of lights, image-based light volumes, decals, and so on.
Digital Foundry: How much idTech5 DNA remains in the latest engine? The virtual texturing seems to have remained, for example.
Billy Khan: We see our engine as an evolving organism that constantly improves and adapts. It is very important to us to constantly stay on the bleeding edge of engine technology. Doom still uses Virtual Materials to feed the PBR renderer with texture data.
Digital Foundry: Did the move to the new rendering set-up require a major change in asset creation and tools?
Tiago Sousa: Yes, as mentioned one of our big goals was to transition idTech 6 into a physically plausible rendering model. This started with transitioning the entire team from an LDR/linear agnostic rendering into high dynamic range rendering and linear correct rendering, then after this step we introduced the team to physically-based shading.
This was a fairly big adjustment, particularly for the art team, as they had to get used to things like tone-mapping, image exposure, linear correctness, physically plausible texture parameterisation, asset creation in a consistent manner, and so on. Even for the engineering team this was a big transition; getting everyone up and running understanding all of the relevant nuances - eg transitioning all inputs to linear correct, HDR lightmap, no magic multipliers and such - all required for consistent and high quality rendering.
Digital Foundry: How does the limited ESRAM space on Xbox One affect the dynamic resolution scaling implementation? What is your approach to ESRAM management in general?
Tiago Sousa: It has no direct correlation with resolution scaling. ESRAM was used for speeding up bandwidth limited techniques, particularly depth-prepass and shadow map rendering. Then things like the light buffer/thinGbuffer render targets also stored in ESRAM for performance. These targets are reused later on for speeding up transparencies as well.
Digita Foundry: We couldn't help but notice how great elements like the metal shading were throughout. What was the approach towards physically-based shading? Were there specific techniques, say, for the demon skin?
Tiago Sousa: Our lighting approach is a mix of real-time approximations and pre-computed components. For the indirect lighting, idTech 6 uses pre-baked indirect lighting for static geometry, mixed with an irradiance volumes approximation for dynamics. For indirect specular bounce we used an image based lighting approach.
The real-time components use a state-of-the-art analytical lighting model for the direct lighting together with shading anti-aliasing, mixed with real-time directional occlusion and reflections approximation. Skin sub-surface scattering is actually approximated via texture lookups and baked translucency data. It's fairly efficient - particularly compared to the usual costly screen-space approximations.
Our biggest achievement here is how well it performs and its consistency across different surface types, though we're always looking for way to improve even further.
Digital Foundry: Can you talk us through how the 8x TSSAA implementation works? Is it consistent between consoles and PC?
Tiago Sousa: I've always been a fan of amortising/decoupling frame costs. TSSAA is essentially doing that - it reconstructs an approximately 8x super-sampled image from data acquired over several frames, via a mix of image reprojection and couple heuristics for the accumulation buffer.
It has a relatively minimal runtime cost, plus the added benefit of temporal anti-aliasing to try to mitigate aliasing across frames (eg shading or geometry aliasing while moving camera slowly). It's mostly the same implementation between consoles and PC, differences being some GCN-specific optimisations for consoles and couple of minor simplifications.
Digital Foundry: Dynamic resolution scaling works great on consoles - are there technical reasons that preclude the same technology working on PC?
Billy Khan: Dynamic resolution scaling actually works on all of the platforms. We don't currently enable dynamic resolution scaling on the PC because the user can effectively choose the resolution they want from the settings menu. We do offer static resolution scaling that allows users to run at high resolutions but then lower the rendering buffers by percentage to achieve higher frame rates.
Digital Foundry: The scaler is highly effective on both PS4 and Xbox One. Can you give us your thoughts on the importance of resolution in general and its importance in terms of image quality?
Tiago Sousa: We don't use the native scaler from PS4/Xbox One, we do our own upsampling via a fairly optimal bicubic filter. It's also important to mention that the TSSAA implicitly takes into account the dynamic resolution scaling changes, mitigating aliasing occurring from resolution changes.
Resolution importance is a function of eye distance to display and display area - essentially the angular resolution - and to a degree also from the individual visual acuity. What that means is that the further away from your display, the higher the pixel density. After a certain distance/pixel density threshold you are essentially wasting performance that could be used to improve other things. In VR for example you have this tiny display on front of your face, pushing for higher pixel density still makes sense for dealing with things like geometry aliasing.
With console gameplay, where a player typically plays at a distance of two metres or more, and your display size is a common one (say 70" or so) it starts to become a performance waste relatively quickly, particularly if we are talking about 4K. If a developer does it the brute force way, you are essentially rasterising the same content, but literally 4x slower for not that much of a gain. Even for desktop rendering where users sit fairly close to the display, I can think of a myriad of approaches for decoupling resolution costs, than just brute force rendering.
Digital Foundry: Can you discuss the directional occlusion settings on PC?
Tiago Sousa: Lower settings use lower sample count, higher settings use higher sample count. We actually use a fairly low amount of samples overall, but rely on the TSSAA to reconstruct a higher quality result over frames. It's quite performant, about 0.1ms on PC at 1440p.
Digital Foundry: Is it possible to separate the object motion blur and camera motion blur?
Tiago Sousa: From a perspective of correctness/believability, motion blur is essentially simulating the accumulated light between an image exposure for a certain amount of time into film/digital sensor. For approximating such we need to reconstruct the pixel movement history. For real-time purposes that's usually achieved via outputting the relative velocity of a projected surface into the viewing plane, between current and previous frame, while the next frame is usually extrapolated. So, from a physically plausible view, separating object (ie dynamics) and camera (ie statics or just camera rotation) doesn't make much sense. It's programmatically possible, but would introduce noticeable artefacts and not look that nice in end.
Digital Foundry: What are the technical differences between the rendering modes - normal, gritty and cinematic?
Tiago Sousa: Each render mode was designed so that the player would actually like to play with it and have a relatively different visual experience for each play through. Technically, it is simply adjusting the parameterisation for things like lights saturation, tone-mapping, camera auto-exposure and such. The cinematic mode additionally adds some image-based lens-flares and vignetting - more noticeable on bright sources - albeit relatively subtle.
Digital Foundry: Can you go into depth on the wins asynchronous compute gave you on the consoles and any differential there between PS4 and Xbox One?
Jean Geffroy: When looking at GPU performance, something that becomes quite obvious right away is that some rendering passes barely use compute units. Shadow map rendering, as an example, is typically bottlenecked by fixed pipeline processing (eg rasterization) and memory bandwidth rather than raw compute performance. This means that when rendering your shadow maps, if nothing is running in parallel, you're effectively wasting a lot of GPU processing power.
Even geometry passes with more intensive shading computations will potentially not be able to consistently max out the compute units for numerous reasons related to the internal graphics pipeline. Whenever this occurs, async compute shaders can leverage those unused compute units for other tasks. This is the approach we took with Doom. Our post-processing and tone-mapping for instance run in parallel with a significant part of the graphics work. This is a good example of a situation where just scheduling your work differently across the graphics and compute queues can result in multi-ms gains.
This is just one example, but generally speaking, async compute is a great tool to get the most out of the GPU. Whenever it is possible to overlap some memory-intensive work with some compute-intensive tasks, there's opportunity for performance gains. We use async compute just the same way on both consoles. There are some hardware differences when it comes to the number of available queues, but with the way we're scheduling our compute tasks, this actually wasn't all that important.
Digital Foundry: Will we see async compute in the PC version via Vulkan?
Billy Khan: Yes, async compute will be extensively used on the PC Vulkan version running on AMD hardware. Vulkan allows us to finally code much more to the ;metal'. The thick driver layer is eliminated with Vulkan, which will give significant performance improvements that were not achievable on OpenGL or DX.
Digital Foundry:Do you foresee a time where async compute will be a major factor in all engines across formats?
Billy Khan: The time is now, really. Doom is already a clear example where async compute, when used properly, can make drastic enhancements to the performance and look of a game. Going forward, compute and async compute will be even more extensively used for idTech6. It is almost certain that more developers will take advantage of compute and async compute as they discover how to effectively use it in their games.
Digital Foundry: What are your thoughts on adopting Vulkan/DX12 as primary APIs for triple-A game development? Is it still too early?
Axel Gneiting: I would advise anybody to start as soon as possible. There is definitely a learning curve, but the benefits are obvious. Vulkan actually has pretty decent tools support with RenderDoc already and the debugging layers are really useful by now. The big benefit of Vulkan is that shader compiler, debug layers and RenderDoc are all open source. Additionally, it has full support for Windows 7, so there is no downside in OS support either compared to DX12.
Tiago Sousa: From a different perspective, I think it will be interesting to see the result of a game entirely taking advantage by design of any of the new APIs - since no game has yet. I'm expecting to see a relatively big jump in the amount of geometry detail on-screen with things like dynamic shadows. One other aspect that is overlooked is that the lower CPU overhead will allow art teams to work more efficiently - I'm predicting a welcome productivity boost on that side.
Digital Foundry: Can you give us an idea of how you utilise the consoles' CPU and the optimisation opportunities there? The PC version really requires a quad, which - relatively speaking - should wipe the floor with the PS4/Xbox One Jaguars.
Axel Gneiting: We are using all seven available cores on both consoles and in some frames almost the entire CPU time is used up. The CPU side rendering and command buffer generation code is very parallel. I suspect the Vulkan version of the game will run fine on a reasonably fast dual-core system. OpenGL takes up an entire core while Vulkan allows us to share it with other work.
Digital Foundry: Without breaking NDAs, the future of gaming technology appears to see show an even bigger bias towards GPU power vs CPU. Do you think there's more you can do with idTech6 in terms of using GPU for tasks we'd typically associate with the CPU?
Axel Gneiting: In general it's very hard to predict the future so we try to keep our code as simple and straightforward as possible to be able to react to any architecture. Right now it does indeed seem like we are heading into that direction.
Tiago Sousa: For the longer term I could foresee a future where many GPUs work together in a more interesting way than just the old school way of MGPU AFR [multi-GPU alternate frame rendering] and such. Particularly now that developers are trying to amortise/cache costs for being able to scale across different platforms - syncing across GPUs is becoming a big bottleneck for AFR type of approaches.
Digital Foundry: During the beta phase you moved from adaptive to a straight v-sync solution on the console versions. What was your thinking there?
Jean Geffroy: We did improve quite a few things between the closed and open beta, including our v-sync solution as you noticed. We changed this to instead use a triple-buffered solution where we always present the last image that's been rendered by the GPU with minimum latency. This is very similar to Fast Sync that Nvidia recently introduced on PC.
Digital Foundry: Can you give us any idea of how you optimise for performance more generally?
Axel Gneiting: I don't think there is a big secret to that. As anybody else, we use a profiler, find hotspots, optimise them and repeat.
Tiago Sousa: I like to keep things simple. Usually I tackle things from a minimalistic - both data and code - and algorithmically perspective, while taking into account target hardware and a grain of futurology. Eg does it make sense to process all this amount of data, or can we just process a sub-set? Is this the minimal data-set? If the solution is a bit on the rocket science/insane side, what can we do to make it as simple as possible? How would such run well on the slower platforms and how well would it scale? And so on. And of course the usual profile guided micro-optimisations.
Digital Foundry: We saw idTech5 deployed across a number of Zenimax titles - is idTech6 designed to be similarly portable for other developers?
Robert A Duffy: Our engine development is generally guided by the needs of our titles in active development. Unlike companies trying to sell or license engine technology, we have the luxury of being reasonably purpose-built.
We are expanding the capabilities of the technology over time to accommodate a broader set of capabilities and it's worth noting we also do a lot of technology sharing between different studios. If a sister studio does something really well we don't try to re-invent the wheel so to speak, we just ask “how are you doing that?” - it's much quicker.
Digital Foundry: Where next for idTech6? Are there any major areas of interest you're looking into?
Robert A Duffy: Better developer support with tools is a primary near-term goal as making the pipelines better for Art and Design is a key focus. We showed a “Doom Universe” VR tech demo at E3 2016, and building on our prior work in VR hardware, we are now pushing pretty hard on the software side of things. We feel the technology base is in a really great position to deliver extreme fidelity at 90fps+.