No single console gaming genre is as fiercely competitive as the first-person shooter, and whether it's Call of Duty, Halo, Crysis, Killzone or Battlefield, these are franchises defined just as much by their technological distinctiveness as they are by colossal budgets that run into the tens of millions. Into the fray steps the recent-recently Metro: Last Light from Kiev-based developer 4A Games. It lacks in mega-bucks investment, but despite that deficit, it aims to make up the gap in terms of good storytelling, atmosphere and simply exceptional technology.
The latter is the focus of this latest Digital Foundry article. Of all the tech interviews we've published over the years, one of our favourites remains the Metro 2033 Q&A with 4A's chief technical officer Oles Shishkovstov. Forthright, completely open, highly opinionated and passionate about what he does, Shishkovstov is a brilliant interview subject - and when Deep Silver offered us the opportunity to talk tech with him once more, we couldn't say no.
In this piece, we'll be going in-depth on some of the technical innovations found in Metro: Last Light, we get some remarkable behind-the-scenes information on how 4A dealt with the challenging PlayStation 3 architecture, and perhaps best of all we get a unique insight into the long-term potential of the next-gen consoles - from a rendering architect who clearly knows his stuff.
In general, the tech is always limited by the platforms/hardware it has to run on. For me, graphics-wise Battlefield was interesting because of the way they do lighting - its tile-based Compute-oriented lots-of-non-shadowing lights approach was quite radical, especially for artists. Crysis was interesting in the way they fake real-time global illumination [GI]. The other two are kind of old-skool and not interesting. (That was a joke!)
But the graphics aren't the only aspect that makes up the tech. For example Killzone's approach to AI's senses or Battlefield's ongoing research with HDR audio, or Halo's automated LODs... I am not even speaking of everybody's ongoing R&D in locomotion and character animation, especially facial animation.
The good thing for us is that we have really talented engineers here, and we can try 80 per cent of these ideas in 20 per cent of time (and money), and we are staying focused. So it doesn't matter that much that their R&D expenses are more than twice the budget of our whole game. (Another joke... kind of!)
"The PS3 is simple in one aspect. If some code slows down the frame, move it to SPUs and forget about it... As for the RSX, the only thing we offload from it to the SPUs was post-process anti-aliasing."
We were quite well prepared for consoles from the start, because the engine itself was originally designed with them in mind, but I am not saying that we didn't encounter a few surprises. For example, I was really surprised to find out that simply playing video with the built-in decoder takes 10 per cent of all available memory on 360! Surely we didn't use it.
In short, that's the consistent lack of memory and processing power in addition to always fighting DVD access latency. Also, some really specific stuff to handle in order to pass gazillions of TCRs [technical checklist requirements].
That experience definitely pays off. Even before we'd started work on Last Light (although Metro 2033 was already kind of running on PS3 to prove out the tech) we'd heavily focused on improving our memory usage. For example we've completely revamped our animation system (animations use a lot of memory), we've added complex VBR compression for tracks and even started to stream animation data on-demand from DVD/Blu-Ray. That alone cuts the working set from around 90MB per level in 2033 to a window of around 20MB in Last Light, which is impressive considering that the total number of animations more than doubled.
No, it was not that difficult. The PS3 is simple in one aspect. If some code slows down the frame, move it to SPUs and forget about it. We've built a simple and beautiful system based on virtual threads (fibres) running on two real hardware threads. The beauty comes from the fact that we can synchronously (from the looking of code) offload any task to SPU and synchronously wait for results to continue.
The actual execution, when you look at the full machine, is fully asynchronous. The direct and indirect overhead of that offloading is less than 15 microseconds (as seen from PPU), so every piece of code that takes more than that to execute can be offloaded. All we were doing was profiling some real scene, finding the reason for the slowdown, then moving it to the SPUs. In the shipping product there are almost a hundred different SPU tasks, executing about 1000 times per frame, resulting at up to 60 per cent total load on the SPUs.
As for the RSX, the only thing we offload from it to the SPUs was post-process anti-aliasing. We were out of main memory to offload something else.
Apart from that, only the GPU works on GPU tasks. Yes, we optimised for it like crazy. All the occlusion, geometry, and especially shaders were extremely optimised. You know, RSX is good at pixel processing, especially if the shaders aren't ALU-bound, but it's not that good at vertex processing.
"Only the GPU works on GPU tasks. Yes, we optimised for it like crazy... You know, RSX is good at pixel processing, especially if the shaders aren't ALU-bound, but it's not that good at vertex processing."
Still, it was not quite enough and we reduced the internal resolution a bit, but mitigate that with good anti-aliasing and relatively good upsampling. The end result should be indistinguishable from Xbox 360 version if not slightly better in some aspects.
It's difficult to compare such different architectures. SPUs are crazy fast running even ordinary C++ code, but they stall heavily on DMAs if you don't try hard to hide that latency.
From a PC perspective, it seems that not much has changed. Sure, the CPUs are faster today, but the general trend for people is to move to more mobile and/or cheaper systems. So the end result is the same, most people buy hardware on par with the old Nehalem.
By what criteria? Pure performance? Performance per transistor? Performance per watt?
For example, Nvidia currently produces two almost completely different architectures - one is greatly tailored for gaming and another one for Compute. As a console developer I'd prefer the former, even if it is slightly more unbalanced, but with great potential in right hands. As a PC developer I'd prefer the latter - the one which consistently and auto-magically works with maximum utilisation, even if they spend gazillions of transistors just for that.
Maximum settings with 4x SSAA, full tessellation and advanced PhysX? No it isn't playable on single 680 for me. [In fairness, we weren't using super-sampling - DF.] What we tried to achieve this time is to improve the quality of every pixel while still maintaining performance similar to 2033-level. We've slightly missed (read: exceeded) the target and the game becomes slightly faster while providing much better visuals.
As a side effect of developing for consoles, we have multiple code paths and shading paths we can choose from. For example, we moved a lot of the shading load to texture units on PS3 via various SPU-generated tables/textures, balanced it with ALU processing on X360, and almost completely shifted towards ALUs on PCs. But actually we can scale almost everything, including the AI and physics, geometric complexity and shading complexity, lighting and post-processing - a lot of things.
Intel HD 4000 is about on par with current-gen consoles, so what's wrong with it being playable? Yes, memory bandwidth is the real issue, so don't expect it to run 1920x1200 at 30FPS, but something like 720p is playable.
"The devil is in the details, because the line between 'perceptually bad' AI and the one that is 'perceptually good' is really, really thin."
It is much better/faster from a Compute performance point of view but much more bandwidth-starved as a result (except for GT3e [Iris Pro with embedded RAM] maybe). Actually I don't know how Intel/AMD will solve the bandwidth problem for their APUs/SOCs/whatever in the near future. Will we see multi-channeled-DDR3 or a move to GDDR5 or adding huge caches as Intel did?
I'd say this is 50/50 result of both engine improvements and great technical artists who requested those technical improvements.
Definitely. Because the engine evolves during the game development process it is somewhat difficult for people to just keep track of what are the new possibilities on the table. For Last Light the starting point was infinitely higher, so at least 2033 features should be utilized to a greater extent.
We've completely changed the lighting model, added several light types and a lot of controls. The material system was enhanced as well. For example, specular power could be now controlled both by material properties and by light, which is totally unrealistic from a physics standpoint but gives more control to artists. The human skin gets its own treatment. Screen-space reflections are far from physics as well, but greatly complement the lighting model.
The basics are the same, because the decisions made for 2033 were the right ones. The devil is in the details, because the line between "perceptually bad" AI and the one that is "perceptually good" is really, really thin. Some time ago I've been on a GDC session, where one of the Battlefield guys (sorry, I don't remember his name, neither do I remember the exact project they were talking about) - and for that project they basically changed just one thing: the average time an enemy is kept alive was halved.
And all the reviewers and gamers were praising the exceptional AI, the same reviewers and gamers who blamed exactly the same AI in previous project. The line is really thin, I just want to give credit where it's due - the exceptional co-operative work between game and level designers, AI and gameplay programmers, scripting gurus and animators. Thank you all.
"You just cannot compare consoles to PC directly. Consoles could do at least 2x what a comparable PC can due to the fixed platform and low-level access to hardware."
Hmm... sometimes yes, and there were different cases on 360 and PS3. To alleviate that we've improved game logic/AI and animation to ensure that all of the entities can be updated out of order in different threads. PS3 was easier - we've just moved all the animation graph processing, vision and ray-casting, sound-path tracing, IKs and several other compute-bound tasks to SPUs - and that's it.
Apart from more outdoor scenes with high entity counts - no. Of course, we did a lot of small things like blending the image-based lighting, complex post-processing and colour-grading, gameplay affected shading, deep physics and animation integration, etc - but that's too much to talk about.
Yes, unfortunately it does mean more work. The tech could apply tessellation to any model and displace it with every possible texture without seams or shimmering or some other artifacts - but that's not the problem. The problem is to apply it only when and where it is necessary and tune at least the displacement amount to look good - that's a lot of work.
That's because we do fully deferred rendering and not something in-between. That gives us a lot of advantages, but one of them - we can use coarse models for shadowmaps without artifacts, meaning we are paying tessellation cost only once. Still, the "very high" tessellation setting is probably way too much for most hardware on the market.
"We are... in deep research on believable motion and animation. In a nutshell that's full-time physical simulation of the entire body with generated animations based on a 'style' model learned from motion capture."
We are talking PS4, right? I am very excited about both CPU and GPU. Jaguar is a pretty well-balanced out-of-order core and there are eight of them inside. I always wanted a lot of relatively-low-power cores instead of single super-high-performance one, because it's easier to simply parallelise something instead of changing core-algorithms or chasing every cycle inside critical code segment (not that we don't do that, but very often we can avoid it).
Many beefier cores would be even better, but then we'll be left without a GPU! With regards the graphics core, it's great, simply great. It's a modern-age high-performance compute device with unified memory and multiple compute-contexts. The possibilities of CPU-GPU-CPU communication are endless, we can easily expect games doing, for example, AI pathfinding/route planning executing on GPU to become a common thing.
RAM is really, really important for games, but all of it actually being useful depends on available CPU-side bandwidth and latency to the external storage device. I think that they put slightly more RAM than necessary for truly next-generation games this time, but considering the past history of Sony stealing significant percentage of RAM from developers for OS needs - that may be exactly the right amount!
In general - yes, especially for indie developers. You have to understand that x86 is much more friendly for beginners at least because of its simplified memory model. Just try to describe to somebody what the memory barrier is and where and when to put it in - usually you'll be left with the guy getting stuck in an infinite loop! Joking aside - the less time we spend on platform-specific optimisations, the more is left to innovate.
No, you just cannot compare consoles to PC directly. Consoles could do at least 2x what a comparable PC can due to the fixed platform and low-level access to hardware.
Back to the question - yes, yes and yes. There are some things which are just more efficient to do on massively parallel machines like GPUs are. I think that at least initially, with launch titles, the GPU-Compute will be underutilised, but during console's lifetime we'll see more and more unbelievable and innovative things purely thanks to GPUs.
SSAA is all about decoupling rendering resolution (and grid) from output resolution (and grid). So, yes, in some form or another it will be useful. As for any form of post-processing AA - definitely yes, it was used in the past and will be used in the future. As for MSAA - I'd rather like the GPU vendors to use that (rather significant) amount of transistors in other parts of GPUs. Anti-aliasing is the job of graphics programmers and not some magical hardware feature.
Actually that's not true global illumination, but more of a really advanced algorithm producing convincing results. Yes, all that voxelisation and cone tracing is very expensive, too expensive even for Titan-like hardware.
I did a lot of research on GI during our last project, but we did not ship with it. The fundamental problem is: when an artist tweaks lighting on PC (with GI) it usually looks like crap on current-gen consoles (without GI). Next-gen console will solve it, enabling us to use some kind of real-time GI, so both the PC and consoles will get it. Personally I still lean towards coarse scene voxelisation and tweaking from here, quite possibly live with some amount of light leakage.
It seems that personally we will jump to next-gen rather sooner than later. We are currently focused on another important aspect of our games - characters. I mean we are not only working on believable appearance/visualisation, but also are in deep research on believable motion and animation. In a nutshell that's full-time physical simulation of the entire body with generated animations based on a "style" model learned from motion capture. That's a really compute-intensive process, but the one greatly suited to a GPU Compute model.
That's just one example. The whole industry was held back with current-gen consoles, because they are a very important source of revenue. Now the lowest common denominator will be 10x higher, and that's incredible. We can expect some form of GI to become common, it will be rare stuff to see a shadow without umbra/penumbra, every model will be properly tessellated and displaced, the OIT will be commonplace (for games who needs it badly), we will forget forever about smoke not casting shadow onto itself, etc, etc - great times really.
I am not saying that we'll solve all the problems at once and the result will be available in every game onto every console, but a 10x more powerful baseline will spawn all types of research and resulting advancements will translate into many games, [and] not only console ones - the PC graphics will get a huge improvement as a result as well.