TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Yin, Zhaoyuan; Wang, Pichao; Wang, Fan; Xu, Xianzhe; Zhang, Hanling; Li, Hao; Jin, Rong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.01515 (cs)

[Submitted on 2 Dec 2021 (v1), last revised 22 Jul 2022 (this version, v2)]

Title:TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Authors:Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin

View PDF

Abstract:Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations. Most existing methods are bottom-up approaches that try to group pixels into regions based on their visual cues or certain predefined rules. As a result, it is difficult for these bottom-up approaches to generate fine-grained semantic segmentation when coming to complicated scenes with multiple objects and some objects sharing similar visual appearance. In contrast, we propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios. Specifically, we first obtain rich high-level structured semantic concept information from large-scale vision data in a self-supervised learning manner, and use such information as a prior to discover potential semantic categories presented in target datasets. Secondly, the discovered high-level semantic categories are mapped to low-level pixel features by calculating the class activate map (CAM) with respect to certain discovered semantic representation. Lastly, the obtained CAMs serve as pseudo labels to train the segmentation module and produce the final semantic segmentation. Experimental results on multiple semantic segmentation benchmarks show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets under different semantic granularity levels, and outperforms all the current state-of-the-art bottom-up methods. Our code is available at \url{this https URL}.

Comments:	Accepted by ECCV 2022, Oral, open-sourced
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2112.01515 [cs.CV]
	(or arXiv:2112.01515v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.01515

Submission history

From: Pichao Wang [view email]
[v1] Thu, 2 Dec 2021 18:59:03 UTC (10,577 KB)
[v2] Fri, 22 Jul 2022 23:01:32 UTC (21,555 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators