SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Zhang, Yuyou; Corcodel, Radu; Hori, Chiori; Cherian, Anoop; Zhao, Ding

Computer Science > Computer Vision and Pattern Recognition

arXiv:2509.25390 (cs)

[Submitted on 29 Sep 2025 (v1), last revised 28 Feb 2026 (this version, v2)]

Title:SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Authors:Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, Ding Zhao

View PDF

Abstract:We present SpinBench, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs). SpinBench is designed around the core challenge of spatial reasoning: perspective taking, the ability to reason about how scenes and object relations change under viewpoint transformation. Since perspective taking requires multiple cognitive capabilities, such as recognizing objects across views, relative positions grounding, and mentally simulating transformations, SpinBench introduces a set of fine-grained diagnostic categories. Our categories target translation, rotation, object relative pose, and viewpoint change, and are progressively structured so that single-object simpler tasks scaffold toward the most demanding multi-object perspective-taking setting. We evaluate 43 state-of-the-art VLMs, both proprietary and open source. Results reveal systematic weaknesses: strong egocentric bias, poor rotational understanding, and inconsistencies under symmetrical and syntactic reformulations. Scaling analysis shows both smooth improvements and emergent capabilities. While human subjects achieve high accuracy (91.2\%), task difficulty as measured by human response time shows strong correlation with VLM accuracy, indicating that SpinBench captures spatial reasoning challenges shared across humans and VLMs. We believe SpinBench provides critical insights into spatial reasoning in VLMs and highlights key gaps in their ability to reason about physical space. Our website can be found at this https URL.

Comments:	ICLR 2026
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.25390 [cs.CV]
	(or arXiv:2509.25390v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2509.25390

Submission history

From: Yuyou Zhang [view email]
[v1] Mon, 29 Sep 2025 18:48:16 UTC (8,874 KB)
[v2] Sat, 28 Feb 2026 19:33:21 UTC (10,475 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators