MedAtlas: A New Benchmark for Artificial Intelligence in Medical Diagnosis

Saturday 13 September 2025

A team of researchers has created a new benchmark for artificial intelligence (AI) in the medical field, designed to evaluate AI’s ability to perform complex diagnostic tasks. The benchmark, called MedAtlas, is a collection of real-world medical cases that require AI models to analyze multiple imaging modalities and clinical texts.

Medical professionals often face challenging diagnostic scenarios where they need to integrate information from different sources, such as X-rays, MRIs, and CT scans, along with patient histories and lab results. However, current AI systems struggle to perform these tasks accurately, leading to potential misdiagnoses and mistreatments.

MedAtlas aims to address this issue by providing a comprehensive framework for evaluating AI’s ability to reason across multiple imaging modalities and clinical texts. The benchmark consists of 12 cases, each with its own set of imaging studies, clinical histories, and questions that require the AI model to perform complex diagnostic tasks.

One case involves a patient who presented with chronic shoulder pain, which was initially evaluated using radiographs but showed no obvious abnormalities. Further evaluation using MRI revealed an os acromiale, a non-fused segment of the acromion, which is often asymptomatic. However, in this case, the MRI also showed signs of acute inflammation.

Another case involves a patient who underwent ureteroscopy and urinary calculus removal but later developed flank pain and a retroperitoneal hematoma. The AI model was required to analyze CT scans and radiographs to determine the underlying cause of the patient’s symptoms.

MedAtlas is designed to test AI models’ ability to perform multi-modal reasoning, where they need to integrate information from multiple imaging modalities and clinical texts to arrive at an accurate diagnosis. This requires the AI models to have a deep understanding of medical concepts, anatomy, and pathology.

The researchers hope that MedAtlas will help accelerate the development of trustworthy AI systems for medical decision-making. By providing a standardized benchmark, they aim to encourage the creation of more advanced AI models that can accurately diagnose complex medical conditions.

MedAtlas is an important step towards developing AI systems that can assist medical professionals in making accurate diagnoses and improving patient outcomes. As researchers continue to refine this benchmark, it has the potential to become a valuable tool for evaluating AI’s performance in medical imaging analysis and decision-making.

Cite this article: “MedAtlas: A New Benchmark for Artificial Intelligence in Medical Diagnosis”, The Science Archive, 2025.

Artificial Intelligence, Medical Imaging, Diagnostic Tasks, Benchmark, Medatlas, Multi-Modal Reasoning, Clinical Texts, Radiographs, Mri, Ct Scans

Reference: Ronghao Xu, Zhen Huang, Yangbo Wei, Xiaoqian Zhou, Zikang Xu, Ting Liu, Zihang Jiang, S. Kevin Zhou, “MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text” (2025).

Discussion