Sicheng Yang, Rongwei Yu: ALVG: Training High-Quality Multi-modal Fusion Modules for Visual Grounding with Attention Loss. ICMR 2025: 1700-1709