The 2025 Peregrine Report

Illustrative Projects From Each Domain

Technical AI Alignment Research

Defense-in-Depth Analysis of Post-training: Take an open model produced by an organization like DeepSeek and systematically implement every known safety technique on it, measuring how these approaches stack together and where they might conflict or have gaps.

Proposal #4

Evaluation & Auditing Systems

Ultra-Reliable AI Evaluation: Develop benchmarking and engineering methodologies that can identify cases of one in a million or less where models fail catastrophically.

Proposal #55

Intelligence Gathering & Monitoring

AI OSINT: Provide relatively cheap intelligence without requiring unilateralist action, making it politically feasible while revealing where regulatory or governance levers might be needed.

Proposal #65

AI Governance & Policy Development

Human Verification Systems: Build robust systems for verifying human identity to provide a foundational security layer for protecting critical decision-making processes.

Proposal #90

International Coordination

Cross-Border Notification Systems: Develop mechanisms for countries to alert each other about out-of-control AI systems, similar to “red phones” during the Cold War.

Proposal #115

Preparedness & Response

Autoverification (Lean): Develop systems that automate formal verification through Lean theorem proving, addressing the critical shortage of Lean programmers worldwide.

Proposal #173

Public Communication & Awareness

Consensus-Building Evidence for AI Risk: Create compelling, empirical evidence of AI risks through large-scale experiments via concrete demonstration and graphics, as opposed to doing so through abstract theory or thought-experiments.

Proposal #185

Miscellaneous

Whistleblower Protection Fund: Establish a large, long-horizon fund on the order of several hundred million dollars – enough to secure the livelihoods of a substantial cohort of potential whistleblowers for a decade and to cover major legal exposure – ensuring both financial safety and sustained legal protection.

Proposal #189

Training Data Attribution

Understand which data used during training is responsible for what behaviors of AI models.

Unlearning Capabilities

Figure out a way to cut out specific abilities or behaviors from a model.

Limits of Model Distillation

Investigate the limits of training a smaller model on particular outputs of a larger model.

End-to-End Harm Assessment

Checks of how capable an AI is in executing harmful ideas, not just talking about them.

Agent IDs and Reputation Systems

Track behavior of individual AIs to disincentivize exploitation and build trust.

Public Demos of Current Capabilities

Layman-accessible showcasing of what AIs can already do and why this is concerning.

Evaluation Companies

Fund companies capable of checking offensive capabilities of new frontier lab outputs.

Expert Collaboration

Get top economics and geopolitics experts into the loop and then act on their suggestions.

Direct Interpretability and Model-Level Interventions

Understand which part of an AI does what, and which parts can be changed without collapse.

Technical safeguards

High-but-below-100% security standards, foregoing theoretical proofs for practicality.

Meta-Research on Safety Techniques

Evaluate progress in subsets of the AI safety diaspora to move towards empiricism.

Open Source AI Drift Monitoring

Tools to detect when and how models change their behavior to intervene before too late.

Support academic project scaling

Counteract academia’s push for novelty by funding further research on promising results.

Biosecurity Controls

Identify dual-use data. Restrict access to highly accountable actors.

Monitoring Complex Agent Interactions

Put AIs in observable environments. Report on their behavior, identifying bad feedback loops.

Policy Studio

Hub which would draft specific AI legislation, regulations, and governance frameworks.

Defense-in-Depth for Closed Door Models

Implement “Swiss cheese layered” defense approaches to make closed source models harder to misuse.

Regulatory Talent

Get the best regulators working on AI security, probably but not only by recruiting existing talent.

Attack Scenarios Analysis

Serious play-by-plays of geopolitical turmoil due to AI progress, to ground discourse and prepare.