Kinect tech spec finally revealed

Retailer reveals details, implications.

Blog by Richard Leadbetter Technology Editor, Digital Foundry

Updated on 30 Jun 2010

Retailer Play.com has published new specifications for the final Kinect hardware.

While the information is unconfirmed via official channels, Play said that the details are direct from the manufacturer, while specs collected by Digital Foundry but not published to date tie in extremely closely with the new data. The smart money is on this being the real deal.

Perhaps the most interesting information we can glean from this is in how the final production Kinect camera differs from the reference technology designed by Microsoft partner PrimeSense.

The Israeli company, who we interviewed back in April, provided the basic design that Microsoft adapted to create the-then Project Natal. Its camera features much the same viewing characteristics as the final Kinect in terms of field of view, but its depth map is much more detailed: 640x480's worth of resolution up against Microsoft's 320x240.

If the depth map has been scaled back, so has definition of skeletal tracking. The new spec suggests 20 points that make up the human skeleton while our demo of Natal back at gamescom last year revealed that 48 points were used.

Having played the same game in both iterations of the hardware, it has to be said that aside from small "jumps" in the fidelity of the 1:1 skeletal tracking, the overall experience is fairly close despite the spec cut-backs.

Probably the biggest concern is the low-resolution depth map, but again, the cut-back does make sense.

In our gamescom demo, presumably using something closer to the original reference design, Kudo Tsunoda expressed reservations that hand and finger tracking would work consistently with the camera simply because human beings come in all different sorts of shapes and sizes. There would be no way to ensure accurate tracking of a child's fingers, for example.

Therefore, for the sake of reliability, the emphasis would shift to tracking the whole body and at that point the need for the VGA depth map was less apparent, although clearly tracking more subtle movements does become more challenging. The lower-resolution depth map also reduces the amount of data being beamed across USB, and decreases processing overhead too.

The other major difference in the final spec compared to the reference is the inclusion of a motorised tilt function in Kinect, which was never part of the original PrimeSense design. This is powered via a bespoke port on the new Xbox 360S, or via a bundled PSU for the older console.

The purpose of the tilt is fairly straightforward: it allows for a more flexible placement of the camera, allowing it to fit more comfortably in more environments. During gameplay it's never been observed to move dynamically, and it is understood that skeletal tracking functions within the 360 APIs are no longer active when the camera motor is in use.

Here's Play.com's data in full.

Sensor

Colour and depth-sensing lenses
Voice microphone array
Tilt motor for sensor adjustment

Field of View

Horizontal field of view: 57 degrees
Vertical field of view: 43 degrees
Physical tilt range: ± 27 degrees
Depth sensor range: 1.2m - 3.5m

Data Streams

320x240 16-bit depth at 30FPS
640x480 32-bit colour at 30FPS
16-bit audio @ 16 kHz

Skeletal Tracking System

Tracks up to 6 people, including 2 active players
Tracks 20 joints per active player
Ability to map active players to Xbox LIVE Avatars

Audio System

Xbox LIVE party chat and in-game voice chat (requires Xbox LIVE Gold Membership)
Echo cancellation system enhances voice input
Speech recognition in multiple languages

Read this next