Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking

Researchers identified spectral entropy collapse of representation covariance as a scalar order parameter that reliably predicts grokking—delayed generalization in neural networks. The mechanism shows a two-phase pattern with entropy crossing a stable threshold ~1,020 steps before generalization, validated on Transformers with group-theoretic tasks. A power-law model predicts grokking timing with 4.1% error across abelian and non-abelian groups, though entropy collapse in MLPs without grokking suggests architecture dependence.