MS: how you'll talk to AI through Kinect

Life-like, natural conversations coming.

Microsoft has promised that natural, life-like conversations with computer controlled characters will become a reality through Xbox 360 motion-sensing add-on Kinect.

Microsoft demonstrated Kinect's voice recognition at E3 in June through the Xbox 360's user interface and select games.

Mass Effect 3, for example, allows player to command party members with voice command. Kinect Sports: Season Two contains more lines of recognisable lines of dialogue than any other game.

Right now, though, strict commands must be said clearly in order to instruct game characters to perform actions, but Microsoft told Eurogamer at the Develop 2011 conference in Brighton last week that natural conversations are coming.

Scott Henson, the boss of Microsoft-owned Kinect Sports developer Rare, outlined how a golf game may work in the future in this regard.

"You'll literally say something like, 'you know caddie, I think I need something that helps me with the wind conditions,'" he told Eurogamer.

"Then the caddie will respond with, 'well, it could be either a six iron or a seven iron.' And you say, 'oh, I'd like the seven iron.' It'll be that natural of a conversation."

One Kinect game – now cancelled – that hinted at natural conversations through Kinect, was Project Milo, from Fable developer Lionhead.

Project Milo allowed users to have a realistic relationship with a young boy, who would recognise and react to the tone of your voice and other player expressions.

The cancellation of that game, according to Henson, does not suggest its gameplay was too ambitious for the technology underpinning Kinect.

"In our game it will be, 'change club seven iron,'" he admitted, "but absolutely, without question, the journey we're on is what I just described. That is where we will go. And guess what will be there? Software. Software will be the key that unlocks why that's possible. We already have the microphone there.

"Now we just need to continue to adapt and grow and build our software to make that better."

Comments (32) Latest comment 10 months ago

Comments for this article are now closed, but please feel free to continue chatting on the forum!

  • Zomoniac #1 10 months ago

    Don't promise what you can't deliver. This, for example.
  • onezeonx #2 10 months ago

    They are basing it on a conversation with Stephen Hawkins....so its possible :p
  • TheTrueSpin #3 10 months ago

    But yet genuine AI conversations remain a fantasy. We are nowhere near conversational AI and even the dedicated AI conversation programs are unconvincing - and these are text-based. Add to this the complexity of speech recognition and you realise that we have a long, long way to go.

    Project Milo was a marketing stunt, designed to fool the uninformed masses and mainstream media. Anyone that knew anything about video games or computing knew straight away that Project Milo was a thing of fiction. No one was surprised when it was "cancelled".
    Edited by TheTrueSpin at 29/07/11 @ 08:28
  • darkmorgado #4 10 months ago

    I see. So they've managed to beat the Turing test have they? They can now program AI that has an actual understanding of what is being said to it and isn't simply recognising certain verbforms and responding with a script?

    No, I didn't think so.
  • McShifty #5 10 months ago

    If it doesn't understand the words faggot, Jew, nigger and noob then it'll never truly be able to understand an Xbox Live gamer.
  • afray #6 10 months ago

    So they've solved not only natural language processing, but also the turing test? Wow, suck it Sony. j/k
  • orangpelupa #7 10 months ago

    "Project Milo allowed users to have a realistic relationship with a young boy, who would recognise and react to the tone of your voice and other player expressions."

    the wording in that sentence ... can it be altered to be read more good?
    currently that sentence gave creepy feeling...

    Btw, pre conditioned conversation with psycologycal trick and the AI that search for certain term maybe will work.
    like in love plus on NDS, but with a lot more variables. (so no need to make the AI really "understand" what human says).

    and not use pre-recorded voice. But using pre recorded voice for the sampling to generate realtime voice like in VOCALOID singing synthesizer or IVONA voice http://www.ivona.com maybe...

    but maybe the ability to "learn" from human will be limited due to ESRB, CERO rating....
    Edited by orangpelupa at 29/07/11 @ 08:57
  • StooMonster #8 10 months ago

    In the future "You'll literally say something like, 'you know caddie, I think I need something that helps me with the wind conditions. Then the caddie will respond with, 'well, it could be either a six iron or a seven iron.' And you say, 'oh, I'd like the seven iron.' It'll be that natural of a conversation."

    In the present "In our game it will be, 'change club seven iron,'"

    In reality, the gamer will press a button because they will say "change club seven iron" and Xbox will respond with "quit game without save?"

    Also, this will a huge cost to development ... internationalisation already costs a lot of money for all that written text, how much more would it cost for speech recognition for all the world's different languages? Also, would they skip corners so all English speakers simply get the American English version? (Let alone the accents in the UK.)

    Edit: close italic tag.
    Edited by StooMonster at 29/07/11 @ 10:40
  • Dagdriver #9 10 months ago

    fantasy or not. I do not care, and I do not want it.
  • ShiftyGeezer #10 10 months ago

    @darkmorgado - The Turing test isn't a valid measure of intelligence and understanding. It only needs to be able to fool a portion of humans. If that counts as true intelligence, than a magician who can fool a large portion of his audience into believing he's done what he appears to have done (saw woman in half, make Eiffel tower disappear) would be classed as able to do real magic.

    Although you're right that this nonsense from MS is never going to get anywhere near beating the Turing test, the Turing test is a fake grail, and the day it's passed we still won't be anywhere near computers actually understanding what's being said.

    (caveat - whenever anyone makes the claim 'it's not going to happen any time soon', there's a breakthrough in design/theory/modeling and it happens :p)
    Edited by ShiftyGeezer at 29/07/11 @ 09:40
  • ShiftyGeezer #11 10 months ago

    @Orangpelupa - whatever system they use, you won't have *natural* conversations. The player is going to have to stop and repeat themselves at times, or rephrase themselves, or the game will get the wrong idea. In the given example, no doubt the game would be listening for "seven iron" in the player's speech and, as it's club selection time in the game, will assume the player's wanting a club. But if the player asks, "do you think a tea spoon or seven iron is best for stirring a cup of tea?" the game will recommend the seven iron.
  • frunk #12 10 months ago

    Quit talking about what *might* be possible and just develop it.

    "Proof is in the pudding" and all that

    And then serve me up a review where someone thought it worked really well and genuinely added something to the game.

    In reality I expect the review will be along the lines of...

    "The natural voice (tm) interface was a interesting gimmick when we could get it to work, however the canned responses soon undermined any impression of a proper AI. After the 5 minutes of setting it up, and 10 minutes of using it, it will be switched off in favor of the manual controls."
  • flaming.carrot #13 10 months ago

    And how is this unique to Kinect? Surely you could use any microphone for this purpose?
  • Ryze #14 10 months ago

    You've had a microphone at your disposal for ages.

    You had Molyneux make a pedo sim a couple of years ago. If this was going to happen soon, it'd be happening now.

    I can't see it until the end of the next gen at least.
  • ShiftyGeezer #15 10 months ago

    @flaming.carrot : Kinect isn't just the harder, but the software. The skeleton tracking is enabled by MS's software and not just having a 3D camera. It's possible that MS have found some amazing breakthrough in natural language recognition enabling the tech now. their work on voice recognition for PC has progressed well, hence Kinect is as good as it is. Still a tillion miles from natural conversations though...
  • Quak #16 10 months ago

    > That is where we will go. And guess what will be there? Software. Software will be the key that unlocks why that's possible. We already have the microphone there.

    Talk about stating the obvious.

    Flying cars. That's where we will go. And guess what will be there? Flying technology. We already have cars.
  • Spong #17 10 months ago

    What's Kinect got to do with it? Why can't the 360's headset be the medium for this so-called "natural conversation"? Is there some kind of microphone/voice recognition technology present in Kinect that isn't present in the headset? I bet there isn't, I bet this is just another pitch to try and sell more bloody Kinects.
  • CamberGreber #18 10 months ago

    Kinect and Voice Rec have nothing to do with each other.

    Every 360 comes with a Microphone.

    The Kinect has no hardware in it dedicated to voice.

    Stop all the Lies MICROSPEIL.
  • IronGiant #19 10 months ago

    Fuck off with this crap. Voice control is yet another gimmick that MS are trying to peddle as being the next big thing, it's clearly not. It's so much easier to press a button
  • andrewsqual #20 10 months ago

    To an RPG NPC, "So did you see Emmerdale last night?", reply "I do not know any rumours at the moment" GENIUS
  • Machetazo #21 10 months ago

    Eyedentify, or forget about it, lol! :p
  • VibratingDonkey #22 10 months ago

    I'm gonna assume the only words Kinect will actually recognize in a sentence like that is "caddie" as the command for Kinect voice recognition to be activated, then "wind". Then algorithms try to interpret what you intended. So instead of trying to have a natural conversation with an inanimate object, you could just say, "caddie, wind. 7 iron". Or some such. Which as far as I'm concerned, would be preferable. Screw natural conversations, I want an efficient UI.

    Voice recognition could be quite useful for UI's if it works consistently. Instead of moving through a cluster of menus you could just say "some word, quick match" and instantly connect to a server. This type of thing ought to become a standard.
  • Nazo #23 10 months ago

    Why would I even want to do that? Never mind that's it's not going to be possible for years, how is talking to my console in such a way fun?
  • Xardan #24 10 months ago

    All Kinect needs is one of those relationship sims that the japanese love so much, and it'll be a massive hit over there. Imagine it, you could talk and interact with your imaginary friend in all sorts of creepy ways.
  • Freek #25 10 months ago

    I'd love to play a game like that, but lets be realistic here: the AI needed to pull that off doesn't exist and Xbox 360 isn't going to be platform to invent it either. The computing power required isn't realistic for a gaming platform.
  • darc #26 10 months ago

    "Software will be the key that unlocks why that's possible. We already have the microphone there."

    Thank you, Captain Obvious. Including a microphone gets you about 0.00000000000000000001% of the way there!

    My cats all have ears. We have fascinating conversations, I can assure you.
  • shortyluke2010 #27 10 months ago

    Would be amazing if this happened, however let's not get too excited.
  • RedSparrows #28 10 months ago

    It'd be nice if people could refer to Milo without 'peado' overtones. Talk about cynicism. Is talking to a child dodgy?

    Also, they're not talking about true AI are they? And they want to market it a spot.

    But please, just moan.
  • kwarive #29 10 months ago

    All you naysayers are missing the subtext:

    In 150 years The Xbox 360,000 will finally beat the Turing test (and not just because the judges are wasted on space-pot like that debacle in 2084). All over the world, moonbase and asteroid habitats people will rejoice as the great philanthropist, genius and latter-day saint Bill Gates will be thawed from his cryogenically frozen state and his conciousness (which somehow managed to survive brain death and degradation due to the nobel prize - sorry - Bill & Melinda prize winning work of Dr. Voodoo Futuristo Pseudoscience) transferred to become the new operating system of the latest xbox.

    Peace, prosperity and happiness will finally reign for all mankind as the final dregs sign up for an xbox live subscription

    This new Golden Age will last for ten million years before Bill Gates finally ascends to a higher plane of existence. Following his ascension humanity will promptly destroy itself as Romeo upon mistakenly discovering the exit of his fair Juliet form this mortal coil and Bill will have to try again by creating another universe - let there be light!

    I tell ya this Scott Hensonuis a smart cookie and I've got a feeling he's in line for a BIG promotion within the MS Group.
  • orangpelupa #30 10 months ago

    @shifty

    thats the same with conversation in mass effect, final fantasy, dragon quest, visual novels...
    they work in their own confines.
    in the latest Milo demo on TED, they show like that. There a text on screen that the player speak to kinect. Similar to conversatino in mass effect (but in ME its just a click).

    so i think in those kinect game that will "fake" conversation, they need to make the game player is "confined" within the boundaries without letting the gamer feel being confined.

    maybe the feeling is like playing FF X (confined but not feel confined) vs playing FF XIII (confined and designed to feel confining)
  • Ryze #31 10 months ago

    @RedSparrows

    When the game's described as developing a relationship with a young boy, then they leave themselves wide open!

    Just joking! The tech's very promising.

    Also: Duke Nukem Forever could do with some of this tech in order to allow the selection of more than 2 weapons, seeing as they reckon they ran out of buttons!

    "Weapon Shotgun!" - that'd work, but isn't too realistic.
  • TudeScud #32 10 months ago

    The guy is starting to remind me a bit of of Steve Ballmers.

    "Software. Software will be the key that unlocks why that's possible."

    Not quite as eloquent as Steve put it though... how did that go again?