Metro Redux: what it's really like to develop for PS4 and Xbox One
Frank discussion with 4A Games about the new wave of consoles.
As tech interviews go, this one's a corker. Readers of our previous Metro 2033 and Metro Last Light tech Q&As will know that 4A Games' chief technical officer Oles Shishkovstov isn't backward about coming forward on the matters that are important to him, and in the transition across to the new wave of console hardware, clearly there are plenty of important topics to discuss.
And it's this frankness and direct, to the point honesty that always makes Oles' interviews so refreshing. In this case, 4A is the first developer willing to talk in-depth and on the record about the process of developing for the new consoles, discussing the problems and opportunities represented by the hardware and software that powers PlayStation 4 and Xbox One. Oles illuminates points that were previously the subject of rumour and hearsay, painting a picture of the challenges that face Xbox One game-makers in particular, offering us a glimpse of how Microsoft is working behind the scenes to improve the development XDK.
There's a wealth of information to sink your teeth into - the performance differential between Xbox One and PlayStation 4 of course, a frank and honest assessment of the Microsoft console's ESRAM, the implications of both CPU and GPU sharing the same memory space (and bandwidth), and observations on PC hardware and DirectX 12. There are some revelations too. Did you know that Microsoft now allows developers to bypass DX11 and talk to the hardware directly in the similar manner to Sony's GNM API? And just how much of a big deal is the return of the Kinect GPU time-slice to developers?
By the way, we were hoping to bring you our Metro Redux Face-Off today. However, some last-minute patching to the PC version means that'll have to wait. In the meantime however, we have included some of the complete console assets we've been working on. For more in-depth coverage of the console versions, our last-gen vs Redux and performance analysis pieces are worth checking out if you missed them. As things stand, we have no issue whatsoever in recommending the game - it's rather special.
I think what we achieved with the new consoles was a really good job given the time we had with development kits in the studio - just four months hands-on experience with Xbox One and six months with the PlayStation 4 (I guess the problems we had getting kits to the Kiev office are well-known now).
But the fact is we haven't begun to fully utilise all the computing power we have. For example we have not utilised parallel compute contexts due to the lack of time and the 'alpha' state of support on those consoles at that time. That means that there is a lot of untapped performance that should translate into better visuals and gameplay as we get more familiar with the hardware.
Well obviously they aren't packing the bleeding edge hardware you can buy for PC (albeit for insane amounts of money) today. But they are relatively well-balanced pieces of hardware that are well above what most people have right now, performance-wise. And let's not forget that programming close to the metal will usually mean that we can get 2x performance gain over the equivalent PC spec. Practically achieving that performance takes some time, though!
But to answer the question - they could last as long. Just remember - back when PS3 first hit the stores - Nvidia G80 was released as well, and it was almost 2x faster than the RSX at the time...
Well, similar GPU architecture is a good thing, really. The reason is that modern GPUs are really complex devices with not-so-obvious performance cliffs. You can't say anymore: 'Here we are ALU limited or ROP limited or texture addressing limited or texture filtering limited or occupancy limited.' There is no correct and simple answer at all. We could be somewhat limited by ALU and somewhat limited by texture addressing and somewhat limited by bandwidth - all at the same time... Mastering that takes some time.
As for the CPU - it doesn't really matter at all, as long as performance is enough. As for RAM hierarchy and its performance - it is different between platforms anyway.
Well we just ported the games over and ran a lot of tests!
One little example I can give: Metro Last Light on both previous consoles has some heavily vectorised and hand-optimised texture-generation tasks. One of them takes 0.8ms on single PS3 SPU and around 1.2ms on a single Xbox 360 hyper-thread. Once we profiled it first time - already vectorised via AVX+VEX - on PS4, it took more than 2ms! This looks bad for a 16ms frame. But the thing is, that task's sole purpose was to offload a few cycles from (older) GPUs, which is counter-productive on current-next-gen consoles. That code path was just switched off.
Well, you kind of answered your own question - PS4 is just a bit more powerful. You forgot to mention the ROP count, it's important too - and let's not forget that both CPU and GPU share bandwidth to DRAM [on both consoles]. I've seen a lot of cases while profiling Xbox One when the GPU could perform fast enough but only when the CPU is basically idle. Unfortunately I've even seen the other way round, when the CPU does perform as expected but only under idle GPU, even if it (the CPU) is supposed to get prioritised memory access. That is why Microsoft's decision to boost the clocks just before the launch was a sensible thing to do with the design set in stone.
Counting pixel output probably isn't the best way to measure the difference between them though. There are plenty of other (and more important factors) that affect image quality besides resolution. We may push 40 per cent more pixels per frame on PS4, but it's not 40 per cent better as a result... your own eyes can tell you that.
Actually, the real pain comes not from ESRAM but from the small amount of it. As for ESRAM performance - it is sufficient for the GPU we have in Xbox One. Yes it is true, that the maximum theoretical bandwidth - which is somewhat comparable to PS4 - can be rarely achieved (usually with simultaneous read and write, like FP16-blending) but in practice I've seen only a few cases where it becomes a limiting factor.
Let's put it that way - we have seen scenarios where a single CPU core was fully loaded just by issuing draw-calls on Xbox One (and that's surely on the 'mono' driver with several fast-path calls utilised). Then, the same scenario on PS4, it was actually difficult to find those draw-calls in the profile graphs, because they are using almost no time and are barely visible as a result.
In general - I don't really get why they choose DX11 as a starting point for the console. It's a console! Why care about some legacy stuff at all? On PS4, most GPU commands are just a few DWORDs written into the command buffer, let's say just a few CPU clock cycles. On Xbox One it easily could be one million times slower because of all the bookkeeping the API does.
But Microsoft is not sleeping, really. Each XDK that has been released both before and after the Xbox One launch has brought faster and faster draw-calls to the table. They added tons of features just to work around limitations of the DX11 API model. They even made a DX12/GNM style do-it-yourself API available - although we didn't ship with it on Redux due to time constraints.
There is no secret. We just adapted to the target hardware.
GCN doesn't love interpolators? OK, ditch the per-vertex tangent space, switch to per-pixel one. That CPU task becomes too fast on an out-of-order CPU? Merge those tasks. Too slow task? Parallelise it. Maybe the GPU doesn't like high sqrt count in the loop? But it is good in integer math - so we'll use old integer tricks. And so on, and so on.
That's just the art of optimisations and that's it. By the way, the PC version directly benefits from those optimisations as well, especially CPU-wise, as all of the platforms have out-of-order CPUs.
Because we can! Actually for the next unannounced project, the designers want more and more of everything (as usual) and quite possibly we will target 30fps.
Look, we shipped a rock-solid 60fps game with the quality right in the middle between the high and very high preset of the PC version. Let's discard around 30 per cent of frame-time for post-processing (as this is basically a constant cost) - so we are at around 11ms for the stuff on screen. Now just imagine if we do target 30fps, that would enable around 2.5 times better, richer visuals.
Since Metro Last Light hit the shelves we've collected numerous improvement suggestions from our players in order to include them in Metro Redux. The power of the new consoles also allowed us to improve the games in the field most critical to gameplay, especially gunplay and general feel - for example combat and cut-scenes became smoother, and controls became much more responsive. Besides that, the new incarnation of Metro 2033 enjoys a lot of the upgrades introduced in Metro: Last Light: new weapons and their upgrades, improved stealth mode viability and takedowns, improved AI with its more realistic behaviour, and the improved visuals etc.
We're quite happy with the games becoming more balanced: they just play better, run faster and look fresher. And the fact that we managed to gather the whole of the Metro world into one package with all of the DLCs, game modes and difficulty settings to provide the definitive package.
We are doing both. We have been working on a new game as well as Redux. We had the production resource free to handle Redux while the next project was in early pre-production, although now the Redux team are needed on the next project as we ramp up! But you have already seen, Metro Redux is not just a port or conversion - it presents a whole new experience, especially for the 2033 part of it!
For the game we are working on now, our designers have shifted to a more sand-box-style experience - less linear but still hugely story-driven. I will not go into details, but it requires some work from programmers as well. Also, we are improving graphics in very different aspects, like recently we did a physically-based global ambient occlusion (instead of local, like SSAO). I will not talk about PBR (physically-based rendering) here, because here we are at the stage when artists are still adapting their mentality to it.
Actually, what is PBR and why to use it? First it means less tuning of content to make it look good. As a result - lighting artists love it, texture artists hate it. But from the technical side PBR is all about specular as a first class citizen in every pixel. Actually the Redux comes with energy-preserving specular - which is a little but important part of PBR, although we intentionally preserved all the tuning knobs for the artists. But yes, we are fully PBR now for the next project. No more tuning knobs - at least for now.
Aside from them being much more close to the (modern) metal, those APIs are a paradigm-shift in API design. DX11 was 'I will keep track of everything for you'. DX12 says 'now it's your responsibility' - so it could be a much thinner layer. As for Mantle, it is a temporary API, in my honest opinion.
No, it's important. All the dependency tracking takes a huge slice of CPU power. And if we are talking about the multi-threaded command buffer chunks generation - the DX11 model was essentially a 'flop', while DX12 should be the right one.
Well, the issue is slightly more complicated - it is not like 'here, take that ten per cent of performance we've stolen before', actually it is variable, like sometimes you can use 1.5 per cent more, and sometimes seven per cent and so on. We could possibly have aimed for a higher res, but we went for a 100 per cent stable, vsync-locked frame-rate this time That is not to say we could not have done more with more time, and per my earlier answer, the XDK and system software continues to improve every month.
Well, CPU performance has essentially stalled due to various factors - economics being one of them. I'd say that PC game-makers should target console CPUs.
This is tricky to answer without going into 'fan wars'. Get the most powerful components your budget allows for, with the emphasis on GPU.
The problem with unified memory is memory coherence. Even on consoles, where we see highly integrated SoCs (system on chips), we have the option to map the memory addresses ranges basically 'for CPU', 'for GPU' and 'fully coherent'. And being fully coherent is really not that useful as it wastes performance. As for the traditional PC? Going through some kind of external bus just to snoop the caches - it will be really slow.
Yes, the original Metro Last Light Linux port was based on OpenGL 3.2 - it was stable but did not support high-end features. For Redux we are essentially replicating the DX11 version, with almost one-to-one correspondence in features. The downside of that approach - the GPU should be at least OpenGL 4 'core profile'.
Definitely. K1 is simply a bright star in the mobile world. I wish the sky would be full of stars to make it economically viable for us!