Research paper investigating how MLP neuron weights in language models encode information in vocabulary space. The work advances neural network interpretability by analyzing the relationship between neuron weights and vocabulary representation, contributing to understanding of internal model mechanisms.
Research
Disentangling MLP Neuron Weights in Vocabulary Space
MLP neurons in language models systematically encode vocabulary semantics in their weight space, offering interpretability insights into how transformers represent language.
Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
research