What is AI Safety?


“Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well”

— Our World in Data

AI has tremendous potential…

For making the world a better place, especially as the technology continues to develop. We’re already seeing some beneficial applications of AI to healthcare, accessibility, language translation, automotive safety, and art creation, to name just a few.

However, the deployment of AI systems into high-stakes settings, such as transportation and medicine, also pose some serious risks.

Some of these concerns apply to current systems: how do we prevent driverless cars from mis-identifying a stop sign in a blizzard? Others are more forward-looking: how can we ensure general AI systems pursue safe and beneficial goals. Others are more forward-looking: how can we ensure advanced AI systems pursue safe and beneficial goals?

Indeed, it’s possible that future AI systems will be qualitatively different from those we see today. They may be able to form sophisticated plans to achieve their goals, and also understand the world well enough to strategically evaluate many relevant obstacles and opportunities. Furthermore, they may attempt to acquire resources or resist shutdown attempts, since these are useful strategies for some goals their designers might specify. To see why these failures might be challenging to prevent, see this research on specification gaming and goal misgeneralization from DeepMind.

It’s worth reflecting on the possibility that an AI system of this kind could outmaneuver humanity’s best efforts to stop it.

Some of these potential concerns have already been demonstrated in modern machine learning systems. DeepMind’s research on specification gaming and goal misgeneralization has demonstrated examples in which reinforcement learning agents can pursue unintended goals, and Meta AI’s Cicero model shows that modern systems can successfully negotiate with and deceive humans as it reaches human-level performance in Diplomacy, a strategic board game.

Introductory Resources

Our brief argument above skipped over a lot of other important considerations. Here are some resources on how AI might possibly cause a catastrophe.

Articles

Introductory Video