Abstract
Background: Accurate radiographic diagnosis requires images obtained with proper technique. Artifacts are unwanted irregularities or densities not produced by the primary X-ray beam and may obscure anatomical details in radiographic images. This retrospective study aimed to evaluate the performance of ChatGPT, Claude AI, and intern dental students in detecting artifacts in panoramic radiographs (PRs).
Methods: Between January and December 2024, panoramic radiographs of 40 patients containing 74 artifacts (motion, mispositioning, airway/soft tissue, and foreign body/metal artifacts) were retrospectively evaluated. The artifact detection performance of ChatGPT-4.0, Claude AI 3.5 Sonnet, and intern dental students was subsequently evaluated and compared with the radiologist-defined gold standard.
Results: Dental students demonstrated higher overall accuracy than both AI models in detecting artifacts on PRs. Among the LLMs, Claude AI showed higher accuracy than ChatGPT in detecting motion artifacts (65.0% vs 42.5%), foreign body/metal artifacts (90.0% vs 62.5%), and patient mispositioning (85.0% vs 67.5%), whereas ChatGPT performed better in identifying airway/soft tissue artifacts (87.5% vs 65.0%).
Conclusions: ChatGPT and Claude AI demonstrated lower performance than dental students in detecting artifacts in panoramic radiographs. These findings suggest that human evaluation remains essential in radiographic interpretation, and further development of LLMs is needed for reliable clinical application.
References
- Zhang F, Sun J. Multi-feature intelligent oral English error correction based on few-shot learning technology. Comput Intell Neurosci. 2022;2022:2501693.
- Borkowski AA, Jakey CE, Mastorides SM, Kraus AL, Vidyarthi G, Viswanadhan N, et al. Applications of ChatGPT and large language models in medicine and health care: benefits and pitfalls. Fed Pract. 2023;40(6):170–3.
- Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9(1):e46885.
- Kurokawa R, Ohizumi Y, Kanzawa J, Kurokawa M, Sonoda Y, Nakamura Y, et al. Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology’s “Diagnosis Please” cases. Jpn J Radiol. 2024; 42(12):1399–402.
- Mavrych V, Yaqinuddin A, Bolgova O. Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience. Adv Physiol Educ. 2025;49(2):430–7.
- Fujimoto M, Kuroda H, Katayama T, Yamaguchi A, Katagiri N, Kagawa K, et al. Evaluating large language models in dental anesthesiology: a comparative analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology board certification exam. Cureus. 2024;16(9):e70302.
- Schmidl B, Hütten T, Pigorsch S, Stögbauer F, Hoch CC, Hussain T, et al. Evaluation of artificial intelligence in the therapy of oropharyngeal squamous cell carcinoma: de-escalation via Claude 3 Opus, Vertex AI and ChatGPT 4.0? An experimental study. Int J Surg. 2024;110(12):8256–60.
- Umer F, Batool I, Naved N. Innovation and application of large language models (LLMs) in dentistry: a scoping review. BDJ Open. 2024;10(1):90.
- Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–102.
- Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, et al. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci. 2023;15(1):29.
- Jacobs T, Shaari A, Gazonas CB, Ziccardi VB. Is ChatGPT an accurate and readable patient aid for third molar extractions? J Oral Maxillofac Surg. 2024;82(10):1239–45.
- Kula B, Kula A, Bagcier F, Alyanak B. Artificial intelligence solutions for temporomandibular joint disorders: contributions and future potential of ChatGPT. Korean J Orthod. 2025;55(2):131–41.
- Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024; 131(4):659.e1–e6.
- Cuevas-Nunez M, Silberberg VIA, Arregui M, ham BC, Ballester-Victoria R, Koptseva I, et al. Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists. Oral Surg Oral Med Oral Pathol Oral Radiol. 2025;139(4):453–61.
- Jung KH. Large language models in medicine: clinical applications, technical challenges, and ethical considerations. Healthc Inform Res. 2025;31(2):114.
- Han T, Nebelung S, Khader F, Wang T, Müller-Franzes G, Kuhl C, et al. Medical large language models are susceptible to targeted misinformation attacks. NPJ Digit Med. 2024;7(1):288.
- Shool S, Adimi S, Saboori Amleshi R, Bitaraf E, Golpira R, Tara M. A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med Inform Decis Mak. 2025;25(1):117.
- Aktuna Belgin C, Serindere G. Evaluation of error types and quality on panoramic radiography. Int Dent Res. 2019;9(3):99–104.
- White SC, Pharoah MJ. Oral radiology: principles and interpretation. 7th ed. St. Louis: Elsevier; 2014.
- Liu X, Duan C, Kim MK, Zhang L, Jee E, Maharjan B, et al. Claude 3 Opus and ChatGPT with GPT-4 in dermoscopic image analysis for melanoma diagnosis: comparative performance analysis. JMIR Med Inform. 2024;12:e59273.
- Wilhelm TI, Roos J, Kaczmarczyk R. Large language models for therapy recommendations across 3 clinical specialties: comparative study. J Med Internet Res. 2023;25:e49324.
- Wójcik D, Adamiak O, Czerepak G, Tokarczuk O, Szalewski L. Comparing the performance of ChatGPT, Gemini, and Claude in English and Polish on medical examinations. Sci Rep. 2025;15(1):33083.
- Nguyen VA, Vuong TQT, Nguyen VH. Comparative performance of deep-reasoning and lightweight large language models on oral implantology multiple-choice questions. Int J Prosthodont. 2025;0(0);1–20.
- Jeong H, Han SS, Yu Y, Kim S, Jeon KJ. How well do large language model-based chatbots perform in oral and maxillofacial radiology? Dentomaxillofac Radiol. 2024;53(6):390–5.
- Zhou S, Luo X, Chen C, Jiang H, Yang C, Ran G, et al. The performance of large language model-powered chatbots compared to oncology physicians on colorectal cancer queries. Int J Surg. 2024;110(10):6509–17.
- Nguyen VA, Nguyen VH, Vuong TQT, Truong QT, Nguyen TT. Comparative study of advanced reasoning versus baseline large-language models for histopathological diagnosis in oral and maxillofacial pathology. PLoS One. 2025; 20(12):e0340220.
- Mine Y, Okazaki S, Taji T, Kawaguchi H, Kakimoto N, Murayama T. Benchmarking multimodal large language models on the dental licensing examination: challenges with clinical image interpretation. J Dent Sci. 2025;20(4):2427–35.
- Suárez A, Arena S, Herranz Calzada A, Castillo Varón AI, Diaz-Flores García V, Freire Y. Decoding wisdom: evaluating ChatGPT’s accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment. Comput Struct Biotechnol J. 2025;28:141–7.
- Liu Z, Ai QYH, Yeung AWK, Tanaka R, Nalley A, Hung KF. Performance of a vision-language model in detecting common dental conditions on panoramic radiographs using different tooth numbering systems. Diagnostics. 2025;15(18):2315.
Recommended Citation
Erol, E. Ç., Aktuna Belgin, C., Serindere, G., & Gunduz, K. Comparison of ChatGPT, Claude AI, and Dental Students in the Detection of Artifacts on Panoramic Radiography. J Dent Indones. 2026;33(1): 46-51