KKUUNNSSHHSSIINNGGHHKUNSH SINGHNEURAL-ID:0xF72A4BSYS.VER:9.42.X
Passionate computer scientist with expertise in machine learning, data science, and natural language processing. Interested in neurocomputation and non-invasive brain-computer interfaces.
Reading a Model's Mind
Anthropic trained two LLMs to talk to each other in activation-space and accidentally built a tool that translates a model's internal state into plain English. Here's how Natural Language Autoencoders work, why I can't stop thinking about them, and where I'd take them next.
ABOUT_SYSTEM
PROJECT_ARCHIVE
Reading a Model's Mind
Interactive explainer of Anthropic's Natural Language Autoencoders — translating an LLM's internal activations into plain English. Built three custom interactive visualizations: an animated AV→AR autoencoder pipeline, a scrubbable FVE-over-training chart, and a click-a-token activation explorer mirroring the paper's planning-in-poetry result.