Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
Florian Eichin, Yupei Du, Philipp Mondorf, Barbara Plank, and Michael A. Hedderich. 2025. Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior. arXiv:2505.20076.
We introduce ExPLAIND — an interpretability framework for jointly attributing model components, data, and training dynamics and apply it to investigate Grokking.