Illustrative Projects From Each Domain
Technical AI Alignment Research
Defense-in-Depth Analysis of Post-training: Take an open model produced by an organization like DeepSeek and systematically implement every known safety technique on it, measuring how these approaches stack together and where they might conflict or have gaps.
Proposal #4
Evaluation & Auditing Systems
Ultra-Reliable AI Evaluation: Develop benchmarking and engineering methodologies that can identify cases of one in a million or less where models fail catastrophically.
Proposal #55
Intelligence Gathering & Monitoring
AI OSINT: Provide relatively cheap intelligence without requiring unilateralist action, making it politically feasible while revealing where regulatory or governance levers might be needed.
Proposal #65
AI Governance & Policy Development
Human Verification Systems: Build robust systems for verifying human identity to provide a foundational security layer for protecting critical decision-making processes.
Proposal #90
International Coordination
Cross-Border Notification Systems: Develop mechanisms for countries to alert each other about out-of-control AI systems, similar to “red phones” during the Cold War.
Proposal #115
Preparedness & Response
Autoverification (Lean): Develop systems that automate formal verification through Lean theorem proving, addressing the critical shortage of Lean programmers worldwide.
Proposal #173
Public Communication & Awareness
Consensus-Building Evidence for AI Risk: Create compelling, empirical evidence of AI risks through large-scale experiments via concrete demonstration and graphics, as opposed to doing so through abstract theory or thought-experiments.
Proposal #185
Miscellaneous
Whistleblower Protection Fund: Establish a large, long-horizon fund on the order of several hundred million dollars – enough to secure the livelihoods of a substantial cohort of potential whistleblowers for a decade and to cover major legal exposure – ensuring both financial safety and sustained legal protection.
Proposal #189
Training Data Attribution
Understand which data used during training is responsible for what behaviors of AI models.
Unlearning Capabilities
Figure out a way to cut out specific abilities or behaviors from a model.
Limits of Model Distillation
Investigate the limits of training a smaller model on particular outputs of a larger model.
End-to-End Harm Assessment
Checks of how capable an AI is in executing harmful ideas, not just talking about them.
Agent IDs and Reputation Systems
Track behavior of individual AIs to disincentivize exploitation and build trust.
Public Demos of Current Capabilities
Layman-accessible showcasing of what AIs can already do and why this is concerning.
Evaluation Companies
Fund companies capable of checking offensive capabilities of new frontier lab outputs.
Expert Collaboration
Get top economics and geopolitics experts into the loop and then act on their suggestions.
Direct Interpretability and Model-Level Interventions
Understand which part of an AI does what, and which parts can be changed without collapse.
Technical safeguards
High-but-below-100% security standards, foregoing theoretical proofs for practicality.
Meta-Research on Safety Techniques
Evaluate progress in subsets of the AI safety diaspora to move towards empiricism.
Open Source AI Drift Monitoring
Tools to detect when and how models change their behavior to intervene before too late.
Support academic project scaling
Counteract academia’s push for novelty by funding further research on promising results.
Biosecurity Controls
Identify dual-use data. Restrict access to highly accountable actors.
Monitoring Complex Agent Interactions
Put AIs in observable environments. Report on their behavior, identifying bad feedback loops.
Policy Studio
Hub which would draft specific AI legislation, regulations, and governance frameworks.
Defense-in-Depth for Closed Door Models
Implement “Swiss cheese layered” defense approaches to make closed source models harder to misuse.
Regulatory Talent
Get the best regulators working on AI security, probably but not only by recruiting existing talent.
Attack Scenarios Analysis
Serious play-by-plays of geopolitical turmoil due to AI progress, to ground discourse and prepare.
