The four-axis ranking
We rank humanity’s most important problems on four quantifiable dimensions — quantity of humans affected, severity per capita, current solution quality, and addressable market size — and package each as a proposal in the spirit of Musk’s Hyperloop Alpha. This document is the proposal for ai safety & alignment. Every number below is sourced and tagged with confidence. Every ranking is a conjecture, open to refutation.
Quantity · humans affected
8.1B
highSeverity · WTP / wealth
100%
lowCurrent solutions
1.5 / 10
lowMarket size · TAM
$20.0B
lowWhat we are trying to solve
As AI systems approach and exceed human-level capability across domains, the open problem is whether their goals and behaviors remain under human correction. Unaligned AI is the one x-risk that is accelerating rather than slowing. A Deutschian framing: safety is not a brake on progress, it is an engineering achievement of progress. The work is technical (interpretability, evals, corrigibility) and institutional (governance, deployment protocols).
The gap between the world and the world that is physically possible
Today: Frontier AI training proceeds with limited interpretability of model internals, no proven scalable alignment method, and minimal regulatory verification capacity.
Current solution quality is rated 1.5 / 10 (low confidence) — meaning there is substantial unclaimed ground between what exists and what is possible. estimated — early field; no proven scalable alignment method yet.
Who is already working on this
6 entities are currently working on this problem across public markets, private companies, and research orgs. Each is evidence the market is real; none has obviously solved it.
OpenAI
private · USAOriginally nonprofit research lab, now capped-profit. Safety and superalignment teams alongside capabilities work.
$157.0B
Anthropic
private · USAAI safety company building Claude. Constitutional AI, mechanistic interpretability, and frontier-scale alignment research.
$61.5B
Conjecture
private · UKAlignment-focused AI lab. Runs Conjecture Institute for critical rationalist research on AI safety.
undisclosed
Goodfire
private · USAMechanistic interpretability as a product, tools for editing model internals rather than just observing.
undisclosed
If we solve this, here is the world we get
After · 15 years
Aligned, corrigible frontier AI is the default deployment pattern. Interpretability tools verify models share human-relevant values before deployment. Capability gains do not increase x-risk.
Requests for startups · 3 concrete companies to build
The MRI machine for neural networks
Every frontier lab ships models it cannot read. Deception and goal-misgeneralization are invisible until they are catastrophic. Build the hosted interpretability layer that flags dangerous circuits before deployment.
- why now
- Mechanistic interpretability went from toy circuits to production-scale feature extraction in three years. The labs now want this and cannot all build it in-house.
- shape
- An API + dashboard that ingests model weights or activations and returns a risk report: deceptive features, situational awareness, sandbagging, capability spikes. Sells to labs, evaluators, and eventually regulators.
- success
- No frontier model is deployed without an interpretability sign-off, the way no bridge opens without an inspection.
The adversarial eval grid for agents
Autonomous agents are shipping with benchmark suites built for chatbots. We are grading self-driving cars with a written test. Build the continuously-updated red-team grid that stress-tests agents at the capability frontier.
- why now
- Agentic deployment went mainstream in 2025–26; incident rate is climbing and no standard adversarial harness exists.
- shape
- A hosted eval platform that runs agents through escalating adversarial scenarios — tool misuse, prompt injection, multi-step deception — and issues a capability + safety profile that updates as new attacks are discovered.
- success
- Every deployed agent carries a current, adversarial safety rating, and the rating actually predicts field failures.
Hardware-enforced AI containment
Alignment that lives only in software can be jailbroken or fine-tuned away. Build the trusted-execution + tamper-evident compute layer that makes a model’s deployment envelope physically enforceable.
- why now
- Confidential-computing silicon (TEEs, secure enclaves at GPU scale) finally exists at the performance tier frontier models need.
- shape
- A compute substrate + attestation protocol where a model can only run inside a verified policy envelope; weight exfiltration and unsanctioned fine-tuning are cryptographically detectable.
- success
- Frontier weights cannot be silently stolen or repurposed, and deployment limits are enforced by physics, not promises.
full rubric + framing on the Requests for Startups page.
What the market can pay
The world is already paying $20.0B per year against this problem (projected annual market for alignment R&D, interpretability tooling, evals, and AI governance services by 2030; low confidence).
A successful solution does not need to capture more — it needs to redirect a meaningful slice of existing spend, plus the latent willingness-to-pay implied by the severity score above. The cost ceiling for a real solution is bounded by this number; everything cheaper is dominated, everything more expensive is a non-starter.
What could go wrong, and how we know we are not wrong
Section in progress
Failure modes, ethical considerations, and the conditions under which this whitepaper would be falsified are being authored as the weekly cadence ships. The Deutschian commitment: every claim above is a conjecture; we publish the conditions under which we would update. New whitepaper sections ship with each Monday newsletter drop. Subscribe to get the upgrade, or contribute on GitHub.
Who would back this
Capital allocators with a stated thesis or deployed portfolio in this domain. This is a starting list — Exa Websets enrichment will expand it to direct check-writers per company.
Emergent Ventures
Fast grants. High-variance, unconventional, talent-first.
Thiel Fellowship
$100k to stop out of school and build something important.
Founders Fund
Contrarian hard tech that rebuilds the industrial base.
Lux Capital
Counter-conventional science at the edges of physics and biology.
What the thinkers say
“AI safety is a knowledge problem, not a limit problem. Aligned AGI is achievable through better explanations, not through halting development.”
“Co-founded OpenAI originally because of concerns about unaligned AI. Has continued to treat AI alignment as an existential priority.”
“Emergent Ventures has funded AI-safety projects and unconventional alignment researchers under the "fast grants" model.”
“AGI is named on the good-quest list. Hard-tech builders, not only researchers, need to be at the center of the safety conversation.”
Where this is wrong, tell us
Every number on this page carries a source and a confidence tag. Every section open to refutation. If a citation is wrong, a number is stale, or a conjecture is unfounded — file a correction.
corrections → use the feedback widget in the nav · open issue at github.com/adamtpang/optimism.fun