The MRI machine for neural networks
Every frontier lab ships models it cannot read. Deception and goal-misgeneralization are invisible until they are catastrophic. Build the hosted interpretability layer that flags dangerous circuits before deployment.
Why now
Mechanistic interpretability went from toy circuits to production-scale feature extraction in three years. The labs now want this and cannot all build it in-house.
The shape of it
An API + dashboard that ingests model weights or activations and returns a risk report: deceptive features, situational awareness, sandbagging, capability spikes. Sells to labs, evaluators, and eventually regulators.
Success looks like
No frontier model is deployed without an interpretability sign-off, the way no bridge opens without an inspection.
If alignment is unsolved, nothing else on this list matters; reading model internals is the genuine scientific frontier and there is no downside to seeing clearly.