Jiayi Zhou, Jiaming Ji, Josef Dai, Yaodong Yang: Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback. AAAI 2025: 27765-27773