nanocircuits
nanocircuits is a small mechanistic-interpretability lab built around known-answer circuits. Its useful claim is not a pretty diagram; it is whether a circuit finder beats the strongest cheap structural baseline and whether the oracle itself is faithful.
- Known-answer circuit recovery with AUROC against ground truth
- Strong structural baselines included
- Oracle faithfulness measured rather than assumed
- Adversarial review issues reflected in the public writeup
Mech interpGround truthBaselines