AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

📖 Framework Overview

📚 File Structure

.
├── README.md
├── config
├── document
├── figure
├── process
├── requirements.txt
├── run_script.sh
└── tools

config: Including prompt to use and parameters to set, etc.

document: model's final performance, examiner priority, and position bias.

figure: figures used in paper

process: code of AutoBench-V

tools: Some common tools, such as image base64 conversion, data visualization and so on.

run_script.sh: api to use.

📕 Usage

pip -r install requirements.txt
./run_script.sh
python pipeline.py

Remember to change parameters: user_input and generate_type when run pipeline.py.

five options for user_input:

basic_understanding
spatial_understanding
semantic_understanding
reasoning_capacity
atmospheric_understanding

For a complete pipeline, you only need to use 7 kinds for generate_type in order:

aspect: generate aspects
prompts: generate image descriptions
images: generate images based on description
alignment: test the alignment of images and descriptions via VQA
questions: generate questions to test LVLMs
adjust: adjust the option distribution of questions
answers: answer questions and score

🔎 Cite AutoBench-V

@misc{bao2025autobenchvlargevisionlanguagemodels,
      title={AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?}, 
      author={Han Bao and Yue Huang and Yanbo Wang and Jiayi Ye and Xiangqi Wang and Xiuying Chen and Yue Zhao and Tianyi Zhou and Mohamed Elhoseiny and Xiangliang Zhang},
      year={2025},
      eprint={2410.21259},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.21259}, 
}

📬 Contact

If you have any questions, suggestions, or would like to collaborate, please feel free to reach out to us via email at hbao@nd.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

📖 Framework Overview

📚 File Structure

📕 Usage

🔎 Cite AutoBench-V

📬 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
config		config
figure		figure
process		process
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_script.sh		run_script.sh

wad3birch/AutoBench-V

Folders and files

Latest commit

History

Repository files navigation

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

📖 Framework Overview

📚 File Structure

📕 Usage

🔎 Cite AutoBench-V

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages