Metro Redux on Switch: the making of an 'impossible' port

4A Games reveals how its PS4 and Xbox One titles transitioned to Nintendo hardware.

It began with Doom 2016 - a Switch port so ambitious, it simply didn't seem possible. However, since then, a procession of technologically ambitious current-gen console titles have migrated onto the Nintendo console hybrid, culminating in the arrival of the wonderful Metro Redux from 4A Games - highly impressive conversions and perhaps the closest, most authentic first-person shooter ports we've seen. So what's the secret? How do developers manage to achieve such impressive results from five-year-old Nvidia mobile hardware?

"At first, I did have really big concerns performance-wise," admits 4A's chief technical officer, Oles Shishkovstov. "You know, going from base PS4/Xbox One with approximately six and a half or seven CPU cores running at 1.6 GHz to 1.75GHz down to only three cores at 1.0GHz sounds scary. The GPU was fine, as graphics can be scaled up and down much easier than, for example, game simulation code."

The results of the conversion work are certainly impressive bearing in mind the yawning gap in CPU specs. 4A started out by translating over the existing Metro Redux games from PS4 and Xbox One (and to stress the point, Switch doesn't get last-gen ports here), a process the 4A team carried out very quickly, but this early version of the game could only manage frame-rates of around seven to 15 frames per second. The games were entirely CPU-bound.

Halving the target frame-rate from the PS4 and Xbox One's 60fps down to 30fps was required before the task of optimising systems began. "First, we backported some optimisations from Exodus to the Redux codebase," Shishkovstov explains. "Then we focused on animation processing on the high level and on extracting ILP (instruction-level parallelism) out of the A57 on the low level - down to assembly. The low level optimizations alone got us to an unstable 30Hz when we were not GPU bound. Then the bone LODding arrived - the CPU [issue] was 'solved' even with some headroom necessary for stable framerate."

Everything you need to know about the Switch versions of Metro 2033 and Metro Last Light. Impressive stuff!

Explained like that, 4A's solution to the Switch's CPU limitation seems fairly straightforward but the process of coding at the assembly level - literally the native language of the Switch ARM Cortex-A57 CPU cluster - can't have been a walk in the park. Animation sucks up a lot of processor cycles, so the idea of adding level of detail (LOD) transitions to the system makes a lot of sense.

After this, 4A moved on to GPU optimisations, and it all began with the choice of graphics API. The firm has a long history of supporting the most performant, low-level APIs, with Metro Exodus running on DX11, DX12, Vulkan and GNM across its various multi-platform releases. Switch itself supports OpenGL and Vulkan, but for optimal performance, 4A chose the API developed by Nvidia itself for best performance on Switch.

"NVN is is lowest possible graphics API on NX," explains Shishkovstov. "CPU overhead is negligible, in most cases that's just a few DWORDs written to the GPU command buffer. It is well-designed, clean and exposes everything the hardware is capable of. Much better than Vulkan, for example."

And it's here where we're especially interested in how Switch delivers so much from so little. When the Nintendo hardware was first announced, our only experience of the Tegra X1 processor came from the Shield Android TV, where last-gen console conversions typically under-performed. It seems that NVN really makes a key difference here, with 4A suggesting that it gives direct access to the Nvidia Maxwell architecture. So what Maxwell features are used in Metro Redux?

"I am not sure I can talk that about, but we use all of them it seems," explains Shishkovstov. "Much of our GPU optimisations were focused on reducing memory bandwidth/off-chip traffic. For example, NVN exposes a lot of controls for memory compression, tile cache behavior and binning, memory layout and aliasing. For example, the straight immediate mode rendering is only used during g-buffer creation and shadow map rendering. Every other pass, including forward rendering and deferred lighting uses binning rasteriser with different settings for tile cache."

In common with a lot of games of this generation, Metro Redux also sees the developer make the jump to using temporal super-sampling - or temporal super resolution, as 4A calls it. The idea is very straightforward. Traditional super-sampling is the process of rendering at a higher-than-native resolution, before downsampling to the developer's chosen pixel-count. TSR is the same basic idea, except additional detail is gleaned from past frames instead. The technology is being used extensively in improving smartphone camera quality, but outside of games, there are other uses too.

PS4Switch DockedXbox 360
The first moments of Metro 2033 demonstrate the changes between the original and Redux versions of the game. Switch compares favourably to PS4 in this scene but is slightly softer and, strangely, darker.
Switch DockedSwitch Portable
This shot showcases portable mode vs docked. While there is a difference in clarity, it's rather subtle compared to most other games on the system. It looks nearly as good in portable mode as it does while docked.
PS4Switch DockedXbox 360
Switch exhibits a slight loss in contact shadow detail not to mention a slight reduction in texture quality compared to PS4. The original 360 version just looks very different all around - the shadow from the satchel is a nice touch lacking in Redux, however.
PS4Switch DockedXbox 360
The original version of Last Light is similar in visual quality compared to Redux but the 360 version exhibits less impressive contact shadowing and lower image quality.
PS4Switch DockedXbox 360
This scene highlights the improvements possible with the temporal solution employed on Switch. It's not as sharp as PS4 but it's cleaner than 360 all around.
PS4Switch DockedXbox 360
This area features changes in Redux that are visible on both PS4 and Switch. There is a slight difference in water reflections on PS4 but it holds up nicely across all platforms.

"That's a well-known FBI solution for reading car plate numbers from the space satellites," says Oles Shishkovstov. "The problem is it is extremely texture sampling and math heavy for the Switch's GPU. We have to derive something which is much cheaper and without major quality compromises. It wasn't easy. I spent more than a month on that - it seems like Maxwell GPU ISA is my native language now.

"The end result takes approximately 2ms at 1080p with only nine texture samples and tricky math. It also does anti-aliasing as a byproduct. When pushed way to hard (it happens in 1080p) the algorithm still produces pixel perfect edges and sharp texture details and only AA quality somewhat degrades - but that is barely visible even for the trained eye."

Using temporal super resolution, Shishkovstov reckons that the concept of native resolution rendering as we know it isn't particularly relevant, which raises some interesting questions. Look back at our analysis and you'll see that we were able to pull a few pixel counts from individual frames. However, it's games like this, Modern Warfare 2019 and many others that are making us consider new techniques of getting some kind of measure on image quality. Redux on Switch doesn't look as clean as the PS4 version, but if we pull a like-for-like image of Metro from the locked 720p of the last-gen versions, image quality is on another level.

Whether you're docked or running in handheld mode, the accumulated output is 1080p or 720p respectively, but the clarity of the image does adjust, according to content. In terms of overall clarity, the technique chosen does look especially impressive when played portably, which raises the question of how 4A scaled the game across docked and handheld modes.

"Going docked you get 2x faster-clocked GPU but only moderately more bandwidth, so it is not magically 2x faster at all, but still considerably faster," explains Shishkovstov. "That allowed us, for example, to render per-pixel velocities for more objects resulting in slightly more correct TSR and AA. In handheld mode we only draw velocity for HUD/weapon - that's all we can afford.

4A Games' Metro Exodus is a simply phenomenal experience. With its supported for ray traced global illumination, this is a very forward-looking game.

"Also, Redux content was lacking geometry LODs for a lot of meshes. As the art team was busy with Exodus' (huge) DLCs - we programmatically generated missing ones. Both docked and handheld use original PS4/X1 geometry, but handheld uses more aggressive LOD switching, although it is barely noticeable on a small screen. From the user/gamer point of view, handheld is always 720p, docked is always 1080p, otherwise they are the same."

What's also impressive about the Metro Redux port is its sheer consistency in maintaining its target 30fps frame-rate. It's an important point to make because whether we're talking about the id Tech 6 conversions, The Witcher 3, Warframe or most of the other 'impossible ports' to the Switch, it's rare that you find a consistent performance level.

"I am glad we hit a consistent 30fps," shares Shishkovstov. "The only way to hit close to 60 would be to run two render-frames per one simulation frame, at radically reduced quality and inconsistent input lag. That's not the price I want to pay. Running at 30fps allowed no quality compromises - even the material and lighting shaders are exactly the same as PS4 and Xbox One."

As for how the game runs so doggedly at 30fps, 4A puts it down to over-optimisation. "Even without any TSR, the game keeps producing consistent 30fps at 720p in handheld mode in over 99 per cent of frames across the whole game. TSR is more [useful] for 1080p/docked mode."

With continued rumours of improved Switch hardware in development, I thought it would be interesting to see where Nintendo and Nvidia might choose to innovate. After all, a lot of the success of PlayStation 4's design comes from Sony shifting focus and taking onboard developer feedback.

"Since we are generally CPU bound, additional cores would definitely be on the list. Bandwidth and GPU power never hurts either," offers Shishkovstov. Putting CPU power at the forefront may sound surprising, but graphics scale much more easily than the core game code - and in our Switch overclocking tests, ramping up CPU frequency proved more impactful on many games than up-clocking the graphics core.

And while we're on the subject of new hardware, what about the next-gen consoles from Sony and Microsoft? Developers are under NDA, so can't talk about the technical specifics of the hardware. However, key aspects of the new machines are public knowledge - such as the fact that both PS5 and Xbox Series X feature hardware accelerated support in the GPU for real-time ray tracing.

"We are fully into ray tracing, dropping old-school codepath/techniques completely," reveals Shishkovstov - and in terms of how RT has evolved since Metro Exodus? "Internally we experimented a lot, and with spectacular results so far. You will need to wait to see what we implement into our future projects."

Sometimes we include links to online retail stores. If you click on one and make a purchase we may receive a small commission. For more information, go here.

Jump to comments (11)

About the author

Richard Leadbetter

Richard Leadbetter

Technology Editor, Digital Foundry

Rich has been a games journalist since the days of 16-bit and specialises in technical analysis. He's commonly known around Eurogamer as the Blacksmith of the Future.

Related

You may also enjoy...

Comments (11)

Comments for this article are now closed. Thanks for taking part!

Hide low-scoring comments
Order
Threading