Inside Metro: Last Light

Digital Foundry vs. 4A's Oles Shishkovstov on technological innovation, PlayStation 3 and the incredible potential of the next-gen consoles.

No single console gaming genre is as fiercely competitive as the first-person shooter, and whether it's Call of Duty, Halo, Crysis, Killzone or Battlefield, these are franchises defined just as much by their technological distinctiveness as they are by colossal budgets that run into the tens of millions. Into the fray steps the recent-recently Metro: Last Light from Kiev-based developer 4A Games. It lacks in mega-bucks investment, but despite that deficit, it aims to make up the gap in terms of good storytelling, atmosphere and simply exceptional technology.

The latter is the focus of this latest Digital Foundry article. Of all the tech interviews we've published over the years, one of our favourites remains the Metro 2033 Q&A with 4A's chief technical officer Oles Shishkovstov. Forthright, completely open, highly opinionated and passionate about what he does, Shishkovstov is a brilliant interview subject - and when Deep Silver offered us the opportunity to talk tech with him once more, we couldn't say no.

In this piece, we'll be going in-depth on some of the technical innovations found in Metro: Last Light, we get some remarkable behind-the-scenes information on how 4A dealt with the challenging PlayStation 3 architecture, and perhaps best of all we get a unique insight into the long-term potential of the next-gen consoles - from a rendering architect who clearly knows his stuff.

Digital Foundry: It's been three years since Metro 2033. The first-person shooter genre has always been something of a technological arms race, and since the launch of Metro we've had Battlefield 3, two Crysis games, Killzone 3 and Halo 4. What's your take on the progress made in engine tech over the last few years and to what extent is the 4A engine competitive with these mega-budget games?

Oles Shishkovstov: In general, the tech is always limited by the platforms/hardware it has to run on. For me, graphics-wise Battlefield was interesting because of the way they do lighting - its tile-based Compute-oriented lots-of-non-shadowing lights approach was quite radical, especially for artists. Crysis was interesting in the way they fake real-time global illumination [GI]. The other two are kind of old-skool and not interesting. (That was a joke!)

But the graphics aren't the only aspect that makes up the tech. For example Killzone's approach to AI's senses or Battlefield's ongoing research with HDR audio, or Halo's automated LODs... I am not even speaking of everybody's ongoing R&D in locomotion and character animation, especially facial animation.

The good thing for us is that we have really talented engineers here, and we can try 80 per cent of these ideas in 20 per cent of time (and money), and we are staying focused. So it doesn't matter that much that their R&D expenses are more than twice the budget of our whole game. (Another joke... kind of!)

"The PS3 is simple in one aspect. If some code slows down the frame, move it to SPUs and forget about it... As for the RSX, the only thing we offload from it to the SPUs was post-process anti-aliasing."

Metro: Last Light has a scalable engine challenging enough to humble even the most powerful gaming PCs in the world. A six-core Intel i7 paired with three GTX Titans in SLI is just about enough to sustain 2560x1440 at 60 frames per second - but even here the game's super-sampling anti-aliasing options are off the table and don't even think about 4K...

Digital Foundry: Metro 2033 was your first console title. What were the major lessons learned from working with console hardware and what technical improvements did you want to add for Last Light?

Oles Shishkovstov: We were quite well prepared for consoles from the start, because the engine itself was originally designed with them in mind, but I am not saying that we didn't encounter a few surprises. For example, I was really surprised to find out that simply playing video with the built-in decoder takes 10 per cent of all available memory on 360! Surely we didn't use it.

In short, that's the consistent lack of memory and processing power in addition to always fighting DVD access latency. Also, some really specific stuff to handle in order to pass gazillions of TCRs [technical checklist requirements].

That experience definitely pays off. Even before we'd started work on Last Light (although Metro 2033 was already kind of running on PS3 to prove out the tech) we'd heavily focused on improving our memory usage. For example we've completely revamped our animation system (animations use a lot of memory), we've added complex VBR compression for tracks and even started to stream animation data on-demand from DVD/Blu-Ray. That alone cuts the working set from around 90MB per level in 2033 to a window of around 20MB in Last Light, which is impressive considering that the total number of animations more than doubled.

Digital Foundry: You developed the 4A engine with PS3 in mind, but Last Light is the first actual shipping product on the Sony console. How was it working with the hardware? What were the challenges in working with RSX and how did you utilise the SPUs? Did you follow the established pattern of moving challenging GPU systems over to SPU?

Oles Shishkovstov: No, it was not that difficult. The PS3 is simple in one aspect. If some code slows down the frame, move it to SPUs and forget about it. We've built a simple and beautiful system based on virtual threads (fibres) running on two real hardware threads. The beauty comes from the fact that we can synchronously (from the looking of code) offload any task to SPU and synchronously wait for results to continue.

The actual execution, when you look at the full machine, is fully asynchronous. The direct and indirect overhead of that offloading is less than 15 microseconds (as seen from PPU), so every piece of code that takes more than that to execute can be offloaded. All we were doing was profiling some real scene, finding the reason for the slowdown, then moving it to the SPUs. In the shipping product there are almost a hundred different SPU tasks, executing about 1000 times per frame, resulting at up to 60 per cent total load on the SPUs.

As for the RSX, the only thing we offload from it to the SPUs was post-process anti-aliasing. We were out of main memory to offload something else.

Apart from that, only the GPU works on GPU tasks. Yes, we optimised for it like crazy. All the occlusion, geometry, and especially shaders were extremely optimised. You know, RSX is good at pixel processing, especially if the shaders aren't ALU-bound, but it's not that good at vertex processing.

"Only the GPU works on GPU tasks. Yes, we optimised for it like crazy... You know, RSX is good at pixel processing, especially if the shaders aren't ALU-bound, but it's not that good at vertex processing."

Performance analysis of Metro: Last Light running on PlayStation 3 and Xbox 360. Despite the relatively weaker PS3 graphics chip, 4A Games' utilisation of the Cell CPU is such that the game runs smoother in the most demanding combat scenes.

Still, it was not quite enough and we reduced the internal resolution a bit, but mitigate that with good anti-aliasing and relatively good upsampling. The end result should be indistinguishable from Xbox 360 version if not slightly better in some aspects.

Digital Foundry: In our last interview you compared Xbox 360's CPU to Nehalem (first-gen Core architecture from Intel) in terms of performance. So how does PlayStation 3 stack up? And from a PC perspective, has CPU performance really moved on so much since Nehalem?

Oles Shishkovstov: It's difficult to compare such different architectures. SPUs are crazy fast running even ordinary C++ code, but they stall heavily on DMAs if you don't try hard to hide that latency.

From a PC perspective, it seems that not much has changed. Sure, the CPUs are faster today, but the general trend for people is to move to more mobile and/or cheaper systems. So the end result is the same, most people buy hardware on par with the old Nehalem.

Digital Foundry: Three years on and Metro 2033 is still used as a benchmark for PC gaming and CPU/GPU testing. The Frontline benchmark is legendary for the abuse it dishes out - even to top-end cards like the GTX 680. What's your take on the progress AMD and Nvidia have made in improving their GPUs since Metro 2033 shipped?

Oles Shishkovstov: By what criteria? Pure performance? Performance per transistor? Performance per watt?

For example, Nvidia currently produces two almost completely different architectures - one is greatly tailored for gaming and another one for Compute. As a console developer I'd prefer the former, even if it is slightly more unbalanced, but with great potential in right hands. As a PC developer I'd prefer the latter - the one which consistently and auto-magically works with maximum utilisation, even if they spend gazillions of transistors just for that.

Digital Foundry: There's the sense that Last Light is tighter and more optimised this time around, and a GTX 680 offers a highly playable experience on max settings - and the game's even playable on Intel HD 4000. What was your approach to scalability?

Oles Shishkovstov: Maximum settings with 4x SSAA, full tessellation and advanced PhysX? No it isn't playable on single 680 for me. [In fairness, we weren't using super-sampling - DF.] What we tried to achieve this time is to improve the quality of every pixel while still maintaining performance similar to 2033-level. We've slightly missed (read: exceeded) the target and the game becomes slightly faster while providing much better visuals.

As a side effect of developing for consoles, we have multiple code paths and shading paths we can choose from. For example, we moved a lot of the shading load to texture units on PS3 via various SPU-generated tables/textures, balanced it with ALU processing on X360, and almost completely shifted towards ALUs on PCs. But actually we can scale almost everything, including the AI and physics, geometric complexity and shading complexity, lighting and post-processing - a lot of things.

Intel HD 4000 is about on par with current-gen consoles, so what's wrong with it being playable? Yes, memory bandwidth is the real issue, so don't expect it to run 1920x1200 at 30FPS, but something like 720p is playable.

"The devil is in the details, because the line between 'perceptually bad' AI and the one that is 'perceptually good' is really, really thin."

Metro: Last Light compared on PS3 and PC running at 720p on max settings. Limiting resolution like this definitely holds the PC game back, but even here the enhanced textures, motion blur, higher-precision lighting and many other enhanced effects shine through.

Digital Foundry: You've previously talked about good performance on Haswell (the new, upcoming fourth-gen Core). Intel integrated graphics hasn't enjoyed the best reputation. What do you think of the new architecture?

Oles Shishkovstov: It is much better/faster from a Compute performance point of view but much more bandwidth-starved as a result (except for GT3e [Iris Pro with embedded RAM] maybe). Actually I don't know how Intel/AMD will solve the bandwidth problem for their APUs/SOCs/whatever in the near future. Will we see multi-channeled-DDR3 or a move to GDDR5 or adding huge caches as Intel did?

Digital Foundry: Metro 2033 was a beautiful game, but Last Light is clearly a leap beyond. To what extent is this down to new engine features and optimisations?

Oles Shishkovstov: I'd say this is 50/50 result of both engine improvements and great technical artists who requested those technical improvements.

Digital Foundry: We're guessing that your design team also learned a lot from 2033 and that this experience fed into improved results for the new game?

Oles Shishkovstov: Definitely. Because the engine evolves during the game development process it is somewhat difficult for people to just keep track of what are the new possibilities on the table. For Last Light the starting point was infinitely higher, so at least 2033 features should be utilized to a greater extent.

Digital Foundry: You've really pushed the boat out in terms of lighting in Last Light. Are you simply more comfortable with the capabilities of the engine or have you enhanced it?

Oles Shishkovstov: We've completely changed the lighting model, added several light types and a lot of controls. The material system was enhanced as well. For example, specular power could be now controlled both by material properties and by light, which is totally unrealistic from a physics standpoint but gives more control to artists. The human skin gets its own treatment. Screen-space reflections are far from physics as well, but greatly complement the lighting model.

Digital Foundry: Intelligent AI is a must for a good shooter, and your solution in 2033 was pretty impressive, particularly in stealth scenarios. How have you expanded on that for Last Light?

Oles Shishkovstov: The basics are the same, because the decisions made for 2033 were the right ones. The devil is in the details, because the line between "perceptually bad" AI and the one that is "perceptually good" is really, really thin. Some time ago I've been on a GDC session, where one of the Battlefield guys (sorry, I don't remember his name, neither do I remember the exact project they were talking about) - and for that project they basically changed just one thing: the average time an enemy is kept alive was halved.

And all the reviewers and gamers were praising the exceptional AI, the same reviewers and gamers who blamed exactly the same AI in previous project. The line is really thin, I just want to give credit where it's due - the exceptional co-operative work between game and level designers, AI and gameplay programmers, scripting gurus and animators. Thank you all.

"You just cannot compare consoles to PC directly. Consoles could do at least 2x what a comparable PC can due to the fixed platform and low-level access to hardware."

Digital Foundry: There are scenes in Last Light that really push the NPC count - did this pose any challenges to engine tech and performance?

Oles Shishkovstov: Hmm... sometimes yes, and there were different cases on 360 and PS3. To alleviate that we've improved game logic/AI and animation to ensure that all of the entities can be updated out of order in different threads. PS3 was easier - we've just moved all the animation graph processing, vision and ray-casting, sound-path tracing, IKs and several other compute-bound tasks to SPUs - and that's it.

Digital Foundry: Last Light has a deeply rich storyline - were there any story requirements that required you to push the tech?

Oles Shishkovstov: Apart from more outdoor scenes with high entity counts - no. Of course, we did a lot of small things like blending the image-based lighting, complex post-processing and colour-grading, gameplay affected shading, deep physics and animation integration, etc - but that's too much to talk about.

Digital Foundry: You've embraced tessellation enthusiastically in Last Light. Does supporting it mean more work for your artists? Or can the tech extrapolate the new polygons from standard models?

Oles Shishkovstov: Yes, unfortunately it does mean more work. The tech could apply tessellation to any model and displace it with every possible texture without seams or shimmering or some other artifacts - but that's not the problem. The problem is to apply it only when and where it is necessary and tune at least the displacement amount to look good - that's a lot of work.

Digital Foundry: Tessellation often kills frame-rate in many games, yet the performance hit in Last Light is fairly light. What's the explanation?

Oles Shishkovstov: That's because we do fully deferred rendering and not something in-between. That gives us a lot of advantages, but one of them - we can use coarse models for shadowmaps without artifacts, meaning we are paying tessellation cost only once. Still, the "very high" tessellation setting is probably way too much for most hardware on the market.

"We are... in deep research on believable motion and animation. In a nutshell that's full-time physical simulation of the entire body with generated animations based on a 'style' model learned from motion capture."

Digital Foundry: Let's talk about next-gen console. What's your take on the general design in terms of CPU and graphics processing power?

Oles Shishkovstov: We are talking PS4, right? I am very excited about both CPU and GPU. Jaguar is a pretty well-balanced out-of-order core and there are eight of them inside. I always wanted a lot of relatively-low-power cores instead of single super-high-performance one, because it's easier to simply parallelise something instead of changing core-algorithms or chasing every cycle inside critical code segment (not that we don't do that, but very often we can avoid it).

Many beefier cores would be even better, but then we'll be left without a GPU! With regards the graphics core, it's great, simply great. It's a modern-age high-performance compute device with unified memory and multiple compute-contexts. The possibilities of CPU-GPU-CPU communication are endless, we can easily expect games doing, for example, AI pathfinding/route planning executing on GPU to become a common thing.

Digital Foundry: To what extent is the 8GB of GDDR5 in the PlayStation 3 a game-changer? What implications does that have for PC, where even the standard GTX 680 ships with just 2GB of GDDR5?

Oles Shishkovstov: RAM is really, really important for games, but all of it actually being useful depends on available CPU-side bandwidth and latency to the external storage device. I think that they put slightly more RAM than necessary for truly next-generation games this time, but considering the past history of Sony stealing significant percentage of RAM from developers for OS needs - that may be exactly the right amount!

Digital Foundry: The last few years have seen a ton of poorly optimised PC ports of console games. Is the move to x86 architecture across all formats a good or bad thing for PC gaming?

Oles Shishkovstov: In general - yes, especially for indie developers. You have to understand that x86 is much more friendly for beginners at least because of its simplified memory model. Just try to describe to somebody what the memory barrier is and where and when to put it in - usually you'll be left with the guy getting stuck in an infinite loop! Joking aside - the less time we spend on platform-specific optimisations, the more is left to innovate.

Digital Foundry: Do you think that the relatively low-power CPUs in the next-gen consoles (compared to PC, at least) will see a more concerted push to getting more out of GPU Compute?

Oles Shishkovstov: No, you just cannot compare consoles to PC directly. Consoles could do at least 2x what a comparable PC can due to the fixed platform and low-level access to hardware.

Back to the question - yes, yes and yes. There are some things which are just more efficient to do on massively parallel machines like GPUs are. I think that at least initially, with launch titles, the GPU-Compute will be underutilised, but during console's lifetime we'll see more and more unbelievable and innovative things purely thanks to GPUs.

Digital Foundry: Early PS4 work we've seen appears to have utilised either 2x MSAA or post-process AA. Do you think your SSAA/AAA combo could be viable for next-gen console?

Oles Shishkovstov: SSAA is all about decoupling rendering resolution (and grid) from output resolution (and grid). So, yes, in some form or another it will be useful. As for any form of post-processing AA - definitely yes, it was used in the past and will be used in the future. As for MSAA - I'd rather like the GPU vendors to use that (rather significant) amount of transistors in other parts of GPUs. Anti-aliasing is the job of graphics programmers and not some magical hardware feature.

No SSAA SSAA 2x SSAA 4x

Metro: Last Light, running on PC at 720p resolution. Here we're comparing the standard post-process anti-aliasing solution with two additional passes - SSAA 2x and 4x. Here you can see how sub-pixel break-up - like the fence at the top of the screen - is improved by the super-sampling.

No SSAA SSAA 2x SSAA 4x

The branches on the left here have a certain degree of jagginess (and shimmer in motion) with standard post-process AA, while sub-pixel issues are seen on the foliage on the bottom-right. Again, SSAA improves everything.

No SSAA SSAA 2x SSAA 4x

Our final example demonstrates how the various anti-aliasing modes work to resolve standard 'jaggies', as seen here on the grill. Post-process AA does a reasonably decent job here (though pixel-poppoing is an issue in motion), but as you can see, SSAA resolves all.

Digital Foundry: We've seen Unreal Engine 4 step back from true real-time global illumination. Is it simply too expensive, even for next-gen consoles? Can you talk us through 4A's GI solution?

Oles Shishkovstov: Actually that's not true global illumination, but more of a really advanced algorithm producing convincing results. Yes, all that voxelisation and cone tracing is very expensive, too expensive even for Titan-like hardware.

I did a lot of research on GI during our last project, but we did not ship with it. The fundamental problem is: when an artist tweaks lighting on PC (with GI) it usually looks like crap on current-gen consoles (without GI). Next-gen console will solve it, enabling us to use some kind of real-time GI, so both the PC and consoles will get it. Personally I still lean towards coarse scene voxelisation and tweaking from here, quite possibly live with some amount of light leakage.

Digital Foundry: Once it's financially viable to let go of Xbox 360 and PS3, what rendering advancements do you hope to see in next-gen gaming?

Oles Shishkovstov: It seems that personally we will jump to next-gen rather sooner than later. We are currently focused on another important aspect of our games - characters. I mean we are not only working on believable appearance/visualisation, but also are in deep research on believable motion and animation. In a nutshell that's full-time physical simulation of the entire body with generated animations based on a "style" model learned from motion capture. That's a really compute-intensive process, but the one greatly suited to a GPU Compute model.

That's just one example. The whole industry was held back with current-gen consoles, because they are a very important source of revenue. Now the lowest common denominator will be 10x higher, and that's incredible. We can expect some form of GI to become common, it will be rare stuff to see a shadow without umbra/penumbra, every model will be properly tessellated and displaced, the OIT will be commonplace (for games who needs it badly), we will forget forever about smoke not casting shadow onto itself, etc, etc - great times really.

I am not saying that we'll solve all the problems at once and the result will be available in every game onto every console, but a 10x more powerful baseline will spawn all types of research and resulting advancements will translate into many games, [and] not only console ones - the PC graphics will get a huge improvement as a result as well.

Comments (68)

Comments for this article are now closed, but please feel free to continue chatting on the forum!