Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M. Susskind: Stabilizing Transformer Training by Preventing Attention Entropy Collapse. ICML 2023: 40770-40803