Artificial Intelligence

Can ChatGPT Diagnose Brain Tumors Better than Doctors?

AI machine learning LLMs versus human radiologists for diagnosing brain tumors.

Posted October 3, 2024 | Reviewed by Lybi Ma

Source: DeltaWorks/Pixabay

The predictive capabilities of artificial intelligence (AI) large language models (LLMs) to find patterns in enormous amounts of complex brain imaging data are actively being evaluated by researchers and neuroscientists. A new study published recently in European Radiology explores the performance of the AI large language model GPT-4 versus human radiologists with surprising results.

“This study is the first attempt to evaluate GPT-4’s ability to interpret actual clinical radiology reports, rather than from settings like image diagnosis quizzes,” wrote corresponding author Daiju Ueda, an associate professor at Osaka Metropolitan University’s Graduate School of Medicine, along with his research team consisting of Yasuhito Mitsuyama, Hiroyuki Tatekawa, and colleagues.

In the field of artificial intelligence, large language models are machine learning models, specifically deep learning models, which can perform a wide range of natural language processing (NLP) tasks such as generating text, answering questions, analyzing text, translation, classifying text, categorizing text, and acting as a conversational chatbot.

Examples of large language models include Generative Pretrained Transformer (GPT) by OpenAI, Bidirectional Encoder Representations from Transformers (BERT) by Google, and Robustly Optimized BERT Approach (RoBERTa) by Meta AI, among many others.

“Large language models like GPT-4 have demonstrated potential for diagnosis in radiology,” wrote the researchers.

According to the researchers, what’s lacking in the evaluation of GPT-4 as a potential tool for radiologists is a study that evaluates the AI using actual radiology reports where the data tends to be more varied and unstructured in comparison to studies based on more structured data from diagnostic quizzes. This team aimed to investigate this unknown by comparing the diagnostic abilities of GPT-4 performance ability using data from real-world clinical radiology reports.

“We zeroed in on MRI reports pertaining to brain tumors, given the pivotal role radiological reports play in determining treatment routes such as surgery, medication, or monitoring; and that pathological outcomes offer a definitive ground truth for brain tumors,” the scientists wrote.

The researchers assembled a team of human radiologists consisting of four general radiologists and three neuroradiologists who have been certified specialists in diagnostic imaging for the central nervous system by the Japanese Society of Radiology. MRI reports of preoperative brain tumors taken at Osaka Metropolitan University Hospital and the National Hospital Organization Osaka Minami Medical Center were translated from Japanese to English by a general radiologist. To ensure translation quality, a board-certified neuroradiologist who uses English daily and has eight years of experience verified that the translation was complete with no data lost.

The researchers prompted ChatGPT based on GPT-4 by listing three possible differential and final diagnoses ranked in the order of likelihood from the following head MRI findings and then provided real-world data from clinical practices of imaging findings.

In health care, a differential diagnosis describes the process where the medical doctor lists possible conditions or diseases that may account for the symptoms.

For example, a brain abscess is swelling in the brain filled with an opaque, thick, typically off-white fluid called pus. Differential diagnoses of brain abscesses may include bacterial meningitis, brain cancer, fungal infection by the fungus Cryptococcus neoformans, Cysticercosis (tissue infection by parasitic pork tapeworm larvae), and other causes according to Medscape.

The GPT-4 findings were compared to a different group of radiologists consisting of three general radiologists and two board-certified neuroradiologists. Using this methodology, the team evaluated man versus machine learning for 150 radiological reports.

The result revealed that the accuracy rate of GPT-4 in the task of performing differential diagnoses was 94%, which was significantly higher than the best-performing human radiologist. Overall, the human radiologists’ accuracy rate was much lower, ranging between 73% to 89%. The final diagnostic accuracy rate for GPT-4 was 73%, versus the human radiologists ranging between 65% to 79%.

Artificial Intelligence Essential Reads

Integrating AI Into Work: Start-Ups May Have an Advantage

The New Rules for AI

“This study evaluated GPT-4-based ChatGPT’s diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists,” the researchers reported.

Based on their discovery, the scientists conclude that GPT-4 can serve as an effective second opinion as a neuroradiology tool for final diagnoses and as an advisory tool for general radiologists and residents.