Google AI Safety

Google Introduces Frontier Safety Framework to Identify and Mitigate Future AI Risks

May 17, 2024 • 2 min read

Google has announced the Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. This framework aims to stay ahead of potential risks by putting in place mechanisms to detect and address them before they materialize.

The Frontier Safety Framework focuses on severe risks posed by advanced AI models, such as those with exceptional autonomy or sophisticated cyber capabilities. It is designed to complement Google's existing AI safety practices and alignment research, which ensures AI acts in accordance with human values.

The framework is built around three main components:

Identifying Capabilities: Google will research how advanced AI models could potentially cause harm. They will define "Critical Capability Levels" (CCLs) that indicate the minimum capability a model must have to pose a severe risk. These CCLs guide the evaluation and mitigation approach.
Evaluating Models: Google will periodically test their AI models to detect when they approach these critical capability levels. They will develop "early warning evaluations" to alert them before a model reaches a CCL.
Mitigation Plans: When a model passes the early warning evaluations, Google will apply a mitigation plan. This plan will balance the benefits and risks of the model, focusing on security and preventing misuse of critical capabilities.

Initially, the framework focuses on four domains: autonomy, biosecurity, cybersecurity, and machine learning R&D. For each domain, Google has outlined specific CCLs and corresponding security and deployment mitigations.

For example, in the domain of autonomy, a critical capability might be an AI model that can autonomously acquire resources and sustain additional copies of itself. In cybersecurity, a critical capability might be a model that can automate opportunistic cyberattacks.

Research labs like OpenAI and Anthropic have also been investing in AI safety research. OpenAI released their Preparedness Framework last year, and recently outlined key security measures they believe are necessary to safeguard AI technology from misuse. Anthropic is also actively pursuing AI safety research across multiple fronts, including Mechanistic Interpretability, Scalable Oversight, Testing for Dangerous Failure Modes, and Societal Impacts and Evaluations. Collectively, these efforts indicate a growing recognition in the AI research community of the importance of proactively addressing potential risks associated with advanced AI systems.

Google's framework is exploratory and expected to evolve as they learn from its implementation and collaborate with industry, academia, and government. They aim to have the initial framework fully implemented by early 2025.