Researchers identified spectral entropy collapse of representation covariance as a scalar order parameter that reliably predicts grokking—delayed generalization in neural networks. The mechanism shows a two-phase pattern with entropy crossing a stable threshold ~1,020 steps before generalization, validated on Transformers with group-theoretic tasks. A power-law model predicts grokking timing with 4.1% error across abelian and non-abelian groups, though entropy collapse in MLPs without grokking suggests architecture dependence.
Research
Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking
Researchers identify spectral entropy collapse as a scalar order parameter that reliably predicts grokking—enabling 4.1% accurate timing predictions of delayed generalization in Transformers.
Thursday, April 16, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
research