AI Control

As a critical component of AI Safety

Mar 26, 2025

There are two main lines of defense you could employ to prevent schemers from causing catastrophes. https://www.alignmentforum.org/s/PC3yJgdKvk8kzqZyA/p/kcKrE9mzEHrdqtDpE

🟢Alignment: Ensure that your models aren't scheming.

🟢Control: Ensure that even if your models are scheming, you'll be safe, because they are not capable of subverting your safety measures.

AAAI 2025 Future of AI Research report treats AI Control as a critical component of AI Safety, and a necessary complement to alignment.

Clear Endorsements of Control-Oriented Thinking.

🧠 Underlying Assumption:

The report implicitly accepts that misalignment is plausible, and that proactive control mechanisms are required even under scenarios of partial or deceptive alignment. It does not assume that alignment will be reliably solved ex ante.

https://media.licdn.com/dms/document/media/v2/D4E1FAQGYsP65vxlhww/feedshare-document-pdf-analyzed/B4EZXMQK8TGgAY-/0/1742888572158?e=1743638400&v=beta&t=-XMlKVMurf11gGGJsJTkvM1nMmc6OQ73PFcs67bj9u0

AI Control

As a critical component of AI Safety

Discussion about this post