[KUNSH_S1NGH]
X_T_R_M_5ECR3T_M0D3
[][01]
H0ME
[][02]
AB0UT
[][03]
PR0JECTS
[][04]
ST4CK
[][05]
C0NTACT
⇣ RESUME
cat ~/thoughts/*.md

BL0G_FEED

Notes from the rabbit holes — interpretability, ML, and whatever I'm nerding out on
class.data.load(810)

Long-form, occasionally over-caffeinated write-ups. Click in.

interp.decode(activations)
Jun 7, 2026 · 11 min

Reading a Model's Mind

How Natural Language Autoencoders translate a model's internal state into plain English

Anthropic trained two LLMs to talk to each other in activation-space and accidentally built a tool that translates a model's internal state into plain English. Here's how Natural Language Autoencoders work, why I can't stop thinking about them, and where I'd take them next.

InterpretabilityNLAAutoencodersRLAlignment
read.entry()→
[KUNSH_S1NGH]
© 2026 · CYBERNETIC INTERFACE v2.7.3
Design with ♥ in the Cyberpunk UniverseH4CK_1NT3RF4C3