Kinect visionary talks tech

"We decided to have our cake and eat it too."

Feature by Richard Leadbetter Technology Editor, Digital Foundry

Updated on 3 Nov 2010

We've talked in the past about how Kinect's functions work on the principle of "pay as you play" - the more features you use from its repertoire of capabilities, the higher the burden on system resources. In a New Scientist feature published at the beginning of this year, Kipman himself estimated that system resources consumed came to between 10 and 15 per cent. This has now been revised down into "single figures", presumably through the evolution of the system's libraries, but Kipman acknowledges that accommodating Kinect involves making compromises.

"It's a trade-off. As we create games, you can think about the platform as a set of paints and paintbrushes. You can think about our game creators as the painters which use this palette to paint. What Kinect brings to the table is a new set of paints and paintbrushes, it broadens the palette and allows you to do different things," Kipman says.

"Not all features are created equal, you can totally imagine a game that's using practically the entire of the Xbox 360 and still uses identity recognition. You can have a game that uses a small vocabulary of voice recognition that will still have pretty much 100 per cent of the processing. And on and on.

"You can shop, in a way, in the platform by menu, and you can choose the paint colours and paintbrushes you have. This is no different than saying, 'what physics engine, what AI engine, what graphics engine' you're going to be using. I can make the same argument that, hey, I'm going to be using Engine X off the shelf, I'm going to be giving up control over the hardware. There's some amount of resources that I give up for the price of the flexibility and the time to market of using a middleware engine. Same thing applies here. At the end of the day you have to choose the correct set of paint colours to tell the stories you want."

Alex Kipman talks about a 'menu' of Kinect functions that developers can choose from, each with their own impact on system resources. Developers like Ubisoft chose to develop their own game-specific libraries for titles like Your Shape: Fitness Evolved.

All fair comment, but it's also worth pointing out that the latency issue surrounding Kinect means that developers really need to build their titles around the sensor, to the point where Microsoft have issued guidelines to developers on rendering techniques that work well in lowering the latency as well as red-flagging other engine set-ups that really don't work well from a lag perspective. There is also a baseline USB latency that developers really can't avoid. It's built into the hardware.

Kipman side-steps the USB baseline question GI.biz put to him, tackling the lag issue in a different way by saying that a driving game like Joy Ride would be unplayable if there were noticeable latency in the control system, resulting in oversteer or understeer. He says that a lot of predictive tech is used to anticipate movements and thus help lessen the sensations of lag.

He also points out that the human body itself moves in an "analogue" manner compared to the 'digital' precision of buttons on a controller. In processing human movement, Kipman talks about how the 1s and 0s of controller precision become a whole series of "maybes" with Kinect. He describes how traversing physical space takes longer and how that needs to be factored into a control system.

"So the first kind of component that we think about, and have to worry about, is the actual human factor and what the human does in terms of adding lag into the system. The next one is about physics. And physics laws, well, they're laws, they're not subjective. Light only travels so fast, and there are plenty of other rules that people have come up with that we can't work around," he explains.

"In the world of zeroes and ones, all you're doing is sending zeroes and ones down a pathway. In our world, we're actually perceiving the world. We are visualising the world and we're understanding the acoustic characteristics of the world. You know what, that takes longer as well. Now, pass all of this rich data to the console, where the Kinect brain lives, and there's more processing. In the world of zeroes and ones, zero means accelerate, one means brake.

Dance Central has zero lag. By using pre-defined motions, the game can internally calibrate itself similar to the way that Rock Band can be tweaked to eliminate display latency.

"In our world... there's a whole heck of science-fiction turned science-fact to really work in terms of our sophisticated set of algorithms that translate all of this noisy data of voice and visuals into human understanding, full body motion, identity recognition, voice recognition, and that takes time.

"So when I look at the entire chain, look at what the human adds, what the physical barriers add in terms of laws of physics and what processing adds, you find out pretty quickly that simply adding these numbers up means you wouldn't be able to drive a car."

Tellingly he doesn't go into details on how these massive obstacles were overcome, talking about a "significant number" of breakthroughs that "essentially erase" these issues. He talks briefly about the prototype work with Burnout Paradise, which we've played way back at gamescom 2009, and really weren't massively impressed with. It was playable - which Kipman lauds as a huge achievement - but it wasn't particularly fun. Lacking the feedback you get from moving a thumbstick there was a sense of uncertainty about the controls, a feeling that the on-screen motions worked but without additional feedback it was impossible to really get a sense of how good the controls really were.