nanofeatures
nanofeatures carries the nanocircuits discipline to SAE features on real models. The result is a boundary: cheap scores can match attribution on single-position tasks, while distributed circuits need a causal and position-resolved readout.
- Gemma-2-2B and GPT-2 SAE-feature studies
- Cheap baseline ladder reported directly
- Distributed-circuit boundary with bootstrap intervals
- Gradient-free causal position-resolved control included
SAE featuresGemmaAttribution