Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 9 days ago • 12