Digital Foundry: How would you characterise the combination of Xenos and Xenon compared to the traditional x86/GPU combo on PC? Surely on the face of it, Xbox 360 is lacking a lot of power compared to today's entry-level "enthusiast" PC hardware?
Oles Shishkovstov: You can calculate it like this: each 360 CPU core is approximately a quarter of the same-frequency Nehalem (i7) core. Add in approximately 1.5 times better performance because of the second, shared thread for 360 and around 1.3 times for Nehalem, multiply by three cores and you get around 70 to 85 per cent of a single modern CPU core on generic (but multi-threaded) code.
Bear in mind though that the above calculation will not work in the case where the code is properly vectorised. In that case 360 can actually exceed PC on a per-thread per-clock basis. So, is it enough? Nope, there is no CPU in the world that is enough for games!
The 360 GPU is a different beast. Compared to today's high-end hardware it is 5-10 times slower depending on what you do. But performance of hardware is only one side of equation. Because we as programmers can optimise for the specific GPU we can reach nearly 100 per cent utilisation of all the sub-units. That's just not possible on a PC.
In addition to this we can do dirty MSAA tricks, like treating some surfaces as multi-sampled (for example hi-stencil masking the light-influence does that), or rendering multi-sampled shadow maps, and then sampling correct sub-pixel values because we know exactly what pattern and what positions sub-samples have, etc. So, it's not directly comparable.
Digital Foundry: Does PC hardware offer up any additional bonuses in Metro 2033 aside from higher frame-rates and resolutions?
Oles Shishkovstov: Yes and no. When you have more performance on the table, you can either do nothing as you say, and as most direct console ports do, or you add the features. Because our platforms got equal attention, we took the second route.
Naturally most of the features are graphics related, but not all. The internal PhysX tick-rate was doubled on PC resulting in more precise collision detection and joint behavior. We "render" almost twice the number of sounds (all with wave-tracing) compared to consoles. That's just a few examples, so that you can see that not only graphics gets a boost. On the graphics side, here's a partial list:
- Most of the textures are 2048^2 (consoles use 1024^2).
- The shadow-map resolution is up to 9.43 Mpix.
- The shadow filtering is much, much better.
- The parallax mapping is enabled on all surfaces, some with occlusion-mapping (optional).
- We've utilised a lot of "true" volumetric stuff, which is very important in dusty environments.
- From DX10 upwards we use correct "local motion blur", sometimes called "object blur".
- The light-material response is nearly "physically-correct" on the PC on higher quality presets.
- The ambient occlusion is greatly improved (especially on higher-quality presets).
- Sub-surface scattering makes a lot of difference on human faces, hands, etc.
- The geometric detail is somewhat better, because of different LOD selection, not even counting DX11 tessellation.
- We are considering enabling global illumination (as an option) which really enhances the lighting model. However, that comes with some performance hit, because of literally tens of thousands of secondary light sources.
Digital Foundry: What is your evaluation of DirectX 11? What do you think it can bring to a game like Metro 2033? Aside from the new effects possible, do the APIs offer any performance advantage over previous DX iterations?
Oles Shishkovstov: Great! It's simply great. Although the API is still awkward from pure C++ design perspective, the functionality is there. I really enjoy three things: compute shaders, tessellation shaders and draw/create contexts separation.
The major thing that can up the performance is the compute shaders. Today, games spend the majority of the frame doing the various kinds of post-processing. The easy route to extract some performance is to rewrite that post-processing via compute. Even the simple blurs can be almost twice as fast. For example we've rewritten our depth-of-field code, to greatly enhance quality while still maintaining playable frame-rate.
Digital Foundry: Hardware tessellation is a part of the DX11 spec and AMD incorporated a tessellator into Xbox 360, rarely used in console games development so far. What do you make of it, and do you make use of it in Metro 2033?
Oles Shishkovstov: Although we do not use tessellation on the Xbox 360, we use it when running on DX11 hardware. Specifically, all the "organic" things like humans are tessellated, and monsters use real displacement mapping, to greatly enhance visuals.
Digital Foundry: The 4A engine integrates NVIDIA PhysX. What are the core advantages of the hardware acceleration, what sort of hardware do you require for the best experience?
Oles Shishkovstov: The core advantage is simply the performance. The CPUs just aren't there to enable large-scale physical effects (although they are very competitive when processing traditional rigid-body things). However, when you offload costly PhysX processing to GPU, we've got less GPU time for rendering.
It's a difficult question when choosing what hardware will provide the best experience. I'd say that dedicating another (maybe less powerful) GPU specifically for PhysX is the right thing to do!
Digital Foundry: Over and above the tech demo-style elements that make PhysX cool, can you explain to us how the physics add to the gameplay experience?
Oles Shishkovstov: We do not add PhysX effects if they aren't integral to the gameplay experience. We don't add an effect for the sake of an effect. Human eyes and brain are trained to see the inconsistencies. We are only trying to remove those inconsistencies in order to not distract from gameplay and not to lose that immersion we were heavily building brick by brick.
Digital Foundry: How did you integrate PhysX into your many-core engine assuming you don't have hardware acceleration? Are the same principles in use in the Xbox 360 code?
Oles Shishkovstov: That's easy. PhysX SDK has the similar notion of the "task" as we use. The SDK spawns them for every operation which can be safely parallelised, for example each rigid-body shape-shape collision detection, each cloth or fluid update, even the solver(s) is heavily sub-divided into the tasks.
We forward those tasks to our task-scheduler and they are processed in the same manner as everything else. The only "conceptual" difference is between their and our task-model - we "spawn-and-forget" tasks and PhysX uses a "spawn-and-wait" model.
Digital Foundry: You describe 4A as a complete game development framework for PC, PS3 and Xbox 360. Does this mean you're looking to license it to other developers?
Oles Shishkovstov: Yes, we're investigating this scenario. Please wait to hear more about it.
Digital Foundry: You've created state-of-the-art technology that is right up there with some of the best we've seen on console. Both Microsoft and Sony don't want to replace their consoles yet, so where do you see the software evolving from here? How can you improve on what you have achieved with the 4A engine?
Oles Shishkovstov: Well, the majority of our Metro 2033 game runs at 40 to 50 frames per second if we disable vertical synchronisation on 360. The majority of the levels have more than 100MB heap space left unused. That means we under-utilised the hardware a bit...
Oles Shishkovstov is chief technical officer at 4A Games.