Kinect visionary talks tech

"We decided to have our cake and eat it too."

Image credit: Eurogamer

Feature by Richard Leadbetter Technology Editor, Digital Foundry

Updated on Nov. 3, 2010

Alex Kipman - Xbox's director of incubation, and guiding mind behind the Natal project - has given a remarkable interview to sister site GamesIndustry.biz, going into considerable depth on some of the most crucial and controversial elements of the motion sensor. It's the interview Digital Foundry would've killed for, but luckily we were able to sneak a few questions to the GI.biz team that reveal new information on how Kinect works and the ways in which its systems are integrated into the Xbox 360.

One of the biggest and most controversial stories surrounding the system has been the removal of an onboard motion processor. Instead of processed data being transferred over USB, the raw streams from the camera, depth sensor and mic are beamed across to the 360, where the CPU and GPU work in concert to decode and process the data. Kipman makes the claim that the console has the power to spare, saying that no 360 game - not even the latest state-of-the-art title - uses all of the CPU and GPU power available.

"As much as we like to talk about bits and percentages, you take a game like, I don't know, Call of Duty: Black Ops - there's a significant amount of processing, be it CPU or GPU, that still remains on the table," he says.

"So after that, when we came to this revelation about games and future games that would be coming to Xbox, we looked at it and we said, 'Is it worth the trade-off to put onboard processing on the device when we think we can create magical, unique, deep, thorough experiences without it?'

"That trade-off is easy - it's about the affordability of the device. From the perspective of bringing to market this amazing deal, £129.99 with Kinect Adventures, plus sensor - buy one and have your entire family play - it's a very interesting customer value proposition. We can create games which are as rich and thorough and as deep as the games which we have on our platform today and which we will have tomorrow.

"Then the conversation becomes simple: you start moving into a world which says, 'Why keep something complicated when you can make it simple?' We decided to have our cake and eat it too."

There is plenty of merit in what Kipman says here, beyond the shoehorned PR. Theoretically, there is nothing to stop developers creating Kinect titles that look and feel exactly like a premium core gamer title, because the system overhead is minimal - a low amount of CPU time spread across two threads on a single Xenos CPU core plus an even smaller amount of GPU resource. Take a look at Kinect Adventures and you see a fully fleshed out Unreal Engine 3 title, and that's exactly the sort of tech that would not take kindly to being robbed of significant amounts of system resources.

The system's 'brain' has been removed from the Kinect innards, with the Xbox 360 CPU and GPU taking over the processing chores.

Kipman also confirms that the Xenos GPU is used to process the depth information and help build the skeletal data on which a great many Kinect titles depend.

"One of the major key ingredients of the experience is machine learning. Machine learning in our world is defining a world of probabilities. Machine learning, particularly our kind, which is probabilistic, is not really about what you know, it's about what you don't know," he explains.

"It's about being able to look at the world and not see duality, zeroes and ones, but to see infinite shades of grey. To see what's probable. You should imagine that, in our machine learning piece of the brain, which is just one component of the brain, pixels go in and what you get out of it is a probability distribution of likelihood.

"So a pixel may go in and what comes out of it may be - hey, this pixel? 80 per cent chance that this pixel belongs to a foot. Sixty per cent chance it belongs to a head, 20 per cent chance that it belongs to the chest. Now this is where we chop the human body into the 48 joints which we expose to our game designers. What you see is infinite levels of probability for every pixel and if it belongs to a different body part.

"That operation is, as you can imagine, a highly, highly parallelisable operation. It's the equivalent of saying, pixel in, work through this fancy maths equation and imagine you get a positive number, a positive answer, you branch right, you get a negative answer you branch left. Imagine doing this over a forest of probabilities. This is stuff where you'll get a thousand times performance improvement if you put it on the GPU rather than the CPU.

"GPUs are machines designed for these types of operations. The core of our machine learning algorithm, the thing that really understands meaning, and translates a world of noise to the world of probabilities of human parts, runs on the GPU."