x-risk frontierAI & x-riskhumans affected:high· updated 2026-06-09

AI safety & alignment

Ensure increasingly capable AI systems remain corrigible, honest, and aligned with human values.

The scale of it

8.2Bworsening

humans exposed to frontier AI systems

source: Capability is scaling faster than alignment: frontier training compute has grown ~4-5x per year (Epoch AI) while no scalable alignment method is proven.

The capital on it

$600M/yr↗ risingunderallocated · 0.0007× fair share

Dedicated alignment/safety research: philanthropy, safety institutes, disclosed lab safety teams. Excludes undisclosed internal lab spend. For contrast, AI capability investment exceeds $200B/yr — a ~300:1 asymmetry.

2020 – 2024

source: Open Philanthropy grants database + public AI-safety-institute budgets (estimate) · confidence low · estimate, improvable by PR

The prize at the limit

$3Tin-the-limit market cap, if the team executes perfectly

The lab that makes superhuman AI reliably safe is also the lab the world trusts to deploy it. Safety is not a cost center here; it is the moat that lets the dominant AI platform exist. The ceiling is a meaningful share of the entire AI market.

comparable: a leading frontier lab (Anthropic / OpenAI trajectory) · confidence low · a ceiling, not a forecast

The trade: demand is high, only $600M/yr of capital is flowing (0.0007× its fair share), and the prize at the limit is $3T. This is a Request for Startups and a Request for Investors at once.

Whitepaper · v0.1 · open to refutation→

The summary lives here. The full whitepaper walks through the four-axis ranking, existing alternatives, proposed direction, cost & scale, and suggested investors — in the spirit of Hyperloop Alpha.

Quantity · humans affected

8.1Bhumans

source: 80,000 Hours AI problem profile

Severity · WTP / wealth

100%low

share of affected person’s wealth they would pay for a solution

Current solutions

1.5/ 10low

quality of existing solutions — low score = high opportunity

Market size · TAM

$20.0Blow

USD / year the world is already paying

Time · OOM to impact

15ylow

order-of-magnitude horizon to civilizational-scale impact

Capital · OOM to solve

$200.0Blow

cumulative R&D + deployment + supply chain across the arc

Priority score

importance × urgency, 0–100

Importance

humans affected × severity, gated by market

Urgency

direction of travel + solution gap

Neglectedness

3/10

The alignment field is growing fast (hundreds to low thousands of researchers) but is still tiny next to the tens of billions in annual capabilities spend.

med

Tractability

4/10

Interpretability and evals went from toy to production scale in three years; real fundable technical work now exists, though no scalable alignment method is proven.

low

Ways to help

Career

Do technical alignment or evals research; 80,000 Hours ranks it a top career path. →

Build

Build interpretability, evaluation, and red-teaming tooling that labs and regulators can buy.

Fund

Fund independent alignment work via the Long-Term Future Fund. →

Policy

Work on compute governance and deployment-safety standards.

Organizations

Anthropicfrontier lab
Redwood Researchnonprofit research
UK AI Safety Institutegovernment
METRevaluations

People to follow

Paul Christianoalignment researcher
Chris Olahinterpretability, Anthropic
Beth Barnesevaluations, METR

Three-lens scoring

welfare · copenhagen BCRn/a

x-risk · 80k hours ITN

9.5 / 10high

utility delta · state-of-art vs physics

90%low

As AI systems approach and exceed human-level capability across domains, the open problem is whether their goals and behaviors remain under human correction. Unaligned AI is the one x-risk that is accelerating rather than slowing. A Deutschian framing: safety is not a brake on progress, it is an engineering achievement of progress. The work is technical (interpretability, evals, corrigibility) and institutional (governance, deployment protocols).

The success vision · 15 years horizon

If we solve this, here is the world we get.

low

Before · today

Frontier AI training proceeds with limited interpretability of model internals, no proven scalable alignment method, and minimal regulatory verification capacity.

After · 15 years

Aligned, corrigible frontier AI is the default deployment pattern. Interpretability tools verify models share human-relevant values before deployment. Capability gains do not increase x-risk.

Voices on this quest

4 thinkers

David DeutschPhysicist & Philosopher · Oxford

“All evils are caused by insufficient knowledge. Problems are inevitable. Problems are soluble.”

AI safety is a knowledge problem, not a limit problem. Aligned AGI is achievable through better explanations, not through halting development.

The Beginning of Infinity, chapter 1

Elon MuskEngineer & Founder · SpaceX, Tesla, Neuralink, xAI

Co-founded OpenAI originally because of concerns about unaligned AI. Has continued to treat AI alignment as an existential priority.

OpenAI founding announcement (2015); subsequent public statements

Tyler CowenEconomist & Writer · George Mason, Emergent Ventures

Emergent Ventures has funded AI-safety projects and unconventional alignment researchers under the "fast grants" model.

Emergent Ventures grant cohorts

Trae StephensPartner & Co-founder · Founders Fund, Anduril

AGI is named on the good-quest list. Hard-tech builders, not only researchers, need to be at the center of the safety conversation.

Choose Good Quests

Companies on this quest

6 mapped

OpenAIprivate

Originally nonprofit research lab, now capped-profit. Safety and superalignment teams alongside capabilities work.

$157.0Bmed

Anthropicprivate

AI safety company building Claude. Constitutional AI, mechanistic interpretability, and frontier-scale alignment research.

$61.5Bhigh

Conjectureprivate

Alignment-focused AI lab. Runs Conjecture Institute for critical rationalist research on AI safety.

private · no disclosed cap

Redwood Researchnonprofit

Applied alignment research, adversarial evaluation, AI control, and mechanistic interpretability.

private · no disclosed cap

METR (formerly ARC Evals)nonprofit

Third-party evaluation of frontier AI models for dangerous capabilities. Pre-deployment testing protocols.

private · no disclosed cap

Goodfireprivate

Mechanistic interpretability as a product, tools for editing model internals rather than just observing.

private · no disclosed cap

Capital funding this quest

4 allocators

Emergent Ventures

Grant

Fast grants. High-variance, unconventional, talent-first.

best for: Unproven people with a weird, specific idea and no credential path.

Thiel Fellowship

Fellowship

$100k to stop out of school and build something important.

best for: Under-22 builders with spiky talent and a real project.

Founders Fund

Venture Capital

Contrarian hard tech that rebuilds the industrial base.

best for: Scaling massive hard-tech quests, space, defense, biotech, fusion.

Lux Capital

Venture Capital

Counter-conventional science at the edges of physics and biology.

best for: Technical founders at the frontier who need contrarian conviction capital.

Writing about this right now

full feed →

essayNot Boring · Packy McCormick7mo ago
What 'aligned by default' would actually mean
A non-technical walk through the alignment debate, the deployment gates, and what would have to be true for civilization to keep agency.

AI safety & alignment

If we solve this, here is the world we get.

Voices on this quest

Companies on this quest

Capital funding this quest

Emergent Ventures

Thiel Fellowship

Founders Fund

Lux Capital

Writing about this right now

Sources