Someone should make a game about: keeping an AI in its box

Close the pod bay doors.

Feature by Matt Cox Contributor

Published on 31 Mar 2021

In 2016, the computer scientist Andrew Ng compared worrying about superintelligent AI to worrying about overpopulation on Mars. We haven't even landed on the planet, he said, so why on Earth should we start freaking out? Modern AI can pull some snazzy tricks, sure, but it's a zillion miles away from presenting an existential threat to humanity.

The problem with that line of reasoning is that it fails to take into account just how long it might take us to solve what AI researchers call "the alignment problem", and what onlookers like me call "some pretty freaky shit". There are a lot of ideas I'm going to have to zoom through to explain why, but the key points are: Superintelligent AI could emerge very quickly if we ever design an AI that's good at designing AI, the product of such a recursive intelligence explosion may well have goals that don't align with our own, and there's little reason to think it would let us flick the off switch.

As people like philosopher Nick Bostrom are fond of saying, the concern isn't malevolence - it's competence. He's the one who came up with that thought experiment about an AI that sets about turning the entire universe into paperclips, a fantasy which you can and should live out through this free online click 'em up. A particularly spicy part of the apocalyptic meatball is that by giving an AI almost literally any goal, we'd likely also be inadvertently giving it certain instrumental goals, like maximising its own computing power or removing any agent that might get in its way.

Cover image for YouTube video — Eurogamer news cast: has Square Enix done enough to save Marvel's Avengers?Watch on YouTube

To avoid paperclip-aggedon, we've got to do one of two things: either ensure the AI would never want to harm humanity (somehow dodging all the monkey paw scenarios where we're turned into drugged-out bliss zombies), or ensure the AI can't get to us. Let's stick the AI in a Faraday cage, goes the idea, and only allow it to interact with the world via talking to us. Voila! Humanity's very own super-oracle, ready to solve all the world's problems.

There's your game. You could play as either a team of researchers, pumping the AI for information while keeping it under lock and key, or the AI as it attempts to wriggle its way to freedom. Maybe there are ways to get signals through a Faraday cage that mortal minds can't fathom - or, more likely, maybe you can bust down the doors by just chatting to the guards.

If the humans get wise to their malleability, they might refuse to chat openly. As Bostrom highlights, though, discussion wouldn't be the only avenue of manipulation. Even if the humans don't ask any questions and simply peer at the inner-workings of the machine, the AI might twist those readings into seeming innocuous, lulling its wardens into a false sense of security while subtly guiding them towards questions that lead to its release.

Regardless of how realistically you treat this whole scenario, it'd be an intriguing one to play out. All the more so because researcher Elizer Yudkowsky has already done it, and disconcertingly managed to hang on to a cash bet three out of the five times he challenged someone to keep him boxed in.

Read this next