Long read: The beauty and drama of video games and their clouds

"It's a little bit hard to work out without knowing the altitude of that dragon..."

If you click on a link and make a purchase we may receive a small commission. Read our editorial policy.

PrimeSense: Beyond Natal

Digital Foundry meets the men behind the 3D camera.

The two PrimeSense men are also very keen to point out that despite the Microsoft acquisition of time-of-flight camera specialists, 3DV, who'd already made several 3D-camera based gaming demos, all of the video capture and depth perception hardware within Natal comes from them, and only from them.

"PrimeSense isn't just the provider of the 3D technology in Project Natal... it's the sole provider," says Maizels proudly. "Project Natal is much more than a 3D sensing device, but PrimeSense is the only company responsible for the 3D."

However, while the team is happy to take the plaudits for the implementation of its tech within Natal, it's interesting to note that they see the utilisation of what they call the "PrimeSensor" just a small part of the entirety of the package.

"To make it crystal clear, in the beginning there was 3D acquisition. We want to take some pride for ourselves, this is the part that PrimeSense developed and in Natal, this is PrimeSense: no others," says Berenson.

"But Natal is much more than that. Natal is content. Natal is processing software. Natal is about other ways of interaction like voice and so on. Microsoft was able to put this vast and expensive eco-system around it to make turn a raw technology into a product. Natal is far, far wider than the PrimeSense element, but PrimeSense is the acquisition element."

Despite the acquisition of 3DV and its core technologies, PrimeSense is keen to point out that their implementation of the so-called "zcam" is entirely different to 3DV's, and their other competitors, which rely on a system of judging depth known as "time of flight".

"PrimeSense is using proprietary technology that we call Light Coding. It's proprietary. No other company in the world uses that," Adi Berenson says proudly.

"Most of our competitors are using a variety of methods that can be aggregated into one technique that's called 'time of flight'... It pulses a light and times the difference between the pulse and the round trip back to the sensor. Our methodology is nothing like that. What PrimeSense did is an evolution in terms of 3D sensing. We use standard components and the cost of the overall solution and the performance in terms of robustness, stability and no lag suits consumer devices."

Light Coding on the other hand does what it says on the tin: light very close to infrared on the spectrum bathes the scene. What PrimeSense calls "a sophisticated parallel computational algorithm" deciphers the IR data into a depth image. The firm says that this solution, like time of flight, works whatever the lighting conditions of the scene.

"The Natal device's 3D acquisition part is based on our technology, not on time of flight," re-affirms Aviad Maizels.

"We believe that the selection of this technology for the first generation at least is testimony that our proprietary patented method is the best price/performance and the most ready for production. Other than that we are not going to comment in any shape or form on the reasons or the background around the fact that Microsoft also elected to purchase the assets of a company that followed another technology."

The PrimeSense reference design (left) looks similar to Project Natal, but the tech schematic (right) reveals some changes compared to Microsoft's final design.

PrimeSense's offering to potential partners consists of a reference design for the camera, which connects to a computer via USB 2.0, just like the Natal kit. The difference is that this reference design includes a dedicated SoC (System on Chip) which translates the information from the IR sensor into a depth map that is "registered" or matched on a per-pixel basis with the RGB image you get from the conventional RGB camera. The result is a 640x480 image where every single pixel has a depth component.

"If you look at it from the capturing side, the capturing hardware is based on an RGB CMOS sensor and an IR CMOS sensor and an IR source all connected to the PrimeSense IC or SoC which analyses the signals and generates a 3D RGBD signal," explains Adi Berenson.

"RGBD means depth plus image and colour, synchronised over space and over time. In addition we integrated the ability to capture audio, also synchronised. The output signal of the capturing hardware is really four channels of audio, and 3D RGBD. Everything in sync. Everything channelled to the host, ready to be processed. That's the capturing hardware."

The SoC also contains interfaces for the RGB camera, analogue-to-digital converters, plus the USB circuitry required to connect the camera to the PC. The chip also contains some flash RAM meaning that the device is firmware-upgradable.

This is backed up by a bespoke middleware called NITE that is capable of constructing human skeletal data from the image, and thus tracking human motion. Although it is similar to the Natal tech demos we've seen where the system is capable of tracking individual human skeletons, the implementations are radically different. PrimeSense's involvement begins with the camera and ends with the creation of the depth-map for the RGB image.

In terms of the spec of the reference camera, the crucial nuts and bolts data can be found in the table below. You can expect Natal to be very close to this, though we expect that some of the specs here are best case scenarios - Natal is confirmed at 30FPS for scanning, so the 60FPS spec here probably relates to a lower resolution scan that Microsoft doesn't use... similar to that 1600x1200 RGB image size. That said, in our original piece on the system, Kudo Tsunoda did talk about multiple resolutions...

Property Specification
Field of View (Horizontal, Vertical, Diagonal) 58° H, 45° V, 70° D
Depth image size VGA (640x480)
Spatial x/y resolution (at 2m distance from sensor) 3mm
Depth z resolution (at 2m distance from sensor) 1cm
Maximum image throughput (frame-rate) 60FPS
Operation range 0.8m to 3.5m
Colour image size UXGA (1600x1200)
Audio: built-in microphones Two
Audio: digital inputs Four
Data interface/power supply USB 2.0
Power consumption 2.25W
Dimensions 14cm x 3.5cm x 5cm
Operating environment Indoors, all lighting conditions
Operating temperature 0°C - 40°C