Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images

Daniel Wolf

Universitäts Klinikum Ulm

Heiko Hillenhagen

Ulm University

Billurvan Taskin

Ulm University

Alex Bäuerle

Ulm University

Meinrad Beer

Universitäts Klinikum Ulm

Michael Götz

Ulm University

Timo Ropinski

Ulm University

International Conference on Medical Image Computing and Computer-Assisted Intervention 2025

Abstract

Clinical decision-making relies heavily on understanding rel- ative positions of anatomical structures and anomalies. Therefore, for Vision-Language Models (VLMs) to be applicable in clinical practice, the ability to accurately determine relative positions on medical images is a fundamental prerequisite. Despite its importance, this capability re- mains highly underexplored. To address this gap, we evaluate the ability of state-of-the-art VLMs, GPT-4o, Llama3.2, Pixtral, and JanusPro, and find that all models fail at this fundamental task. Inspired by successful approaches in computer vision, we investigate whether visual prompts, such as alphanumeric or colored markers placed on anatomical struc- tures, can enhance performance. While these markers provide moderate improvements, results remain significantly lower on medical images com- pared to observations made on natural images. Our evaluations suggest that, in medical imaging, VLMs rely more on prior anatomical knowledge than on actual image content for answering relative position questions, often leading to incorrect conclusions. To facilitate further research in this area, we introduce the MIRP – Medical Imaging Relative Position- ing – benchmark dataset, designed to systematically evaluate the capa- bility to identify relative positions in medical images. Dataset and code are available on https://wolfda95.github.io/your_other_left/.

Bibtex

@inproceedings{wolf2025medvlms,
	title={Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images},
	author={Wolf, Daniel and Hillenhagen, Heiko and Taskin, Billurvan and B{\"a}uerle, Alex and Beer, Meinrad and G{\"o}tz, Michael and Ropinski, Timo},
	booktitle={Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention}
	year={2025}
}