As AI systems become more capable, researchers at Open AI are evaluating how it could enable new threats, like assisting bad actors with creating dangerous biological agents. The company has shared results from their most recent work building tools to evaluate these emerging risks.
The study aimed to determine whether increased information access using language models like GPT-4 could help bad actors more easily compile knowledge for crafting biological agents vs. existing online resources.
OpenAI applied two key design principles:
- Testing with human participants: According to OpenAI, understanding how actors might exploit AI requires human subjects actually leveraging models to complete biothreat research tasks. Fixed benchmarks cannot simulate how people tailor prompts and follow-up with models dynamically.
- Eliciting full model capabilities: OpenAI aimed to thoroughly evaluate GPT-4's potential for misuse by training participants on best practices for capability elicitation, having experts on hand to advise prompt engineering, and utilizing a special research-only GPT-4 model without certain safety constraints.
By focusing narrowly on the question of improved information access and training participants to most effectively query models, OpenAI tried to quantify the specific risk of AI making dangerous knowledge more accessible.
They had participants across expertise levels attempt to complete discrete tasks covering key aspects of the biological threat creation process with and without AI assistance. Half could only leverage internet search tools to compile the necessary protocols, while the other half also interacted with GPT-4 models in addition to internet access.
Specially trained graders then objectively assessed the accuracy, completeness and novelty of responses using tailored scoring rubrics. OpenAI reports observing a small but measurable increase for the cohorts paired with GPT-4 in factors like accuracy and completeness of procedure information. However, they note the study's limitations, including insufficient statistical power with only 100 participants to draw definitive conclusions on the value of AI access.
"While this uplift is not large enough to be conclusive, our finding is a starting point for continued research and community deliberation," states OpenAI. The lab frames their work as an initial "tripwire" that could flag the need for more scrutiny if accuracy improvements reach levels they determine to be meaningful thresholds.
As companies rapidly develop increasingly advanced AI, some experts worry about dual-use potentials alongside the technology's benefits. OpenAI hopes to get community feedback and spur discussion on building better methods for getting ahead of risks, not just with biotechnology but other high-stakes areas as well. They admit the difficulty in translating evaluation results into meaningful risk thresholds.
While the study has limitations in scope and statistical significance, it represents an initial attempt at rigorous evaluation around dual-use threats—an area requiring much additional work and discussion as AI progresses.