Diagnostic Test Accuracy of Artificial Intelligence in Early Diagnosis of Osteoarthritis: A Systematic Review and Meta-Analysis of 45,588 Knees

Mohamed Abdelsalam¹, Hadeer Hafez², Omar Sameh Nabil El Sedafy¹, Nourhan Abouelella³, Ahmed Abdulhafeez Hamza³, Omnia Samy El-Sayed⁴, Mohamed Reda Awad⁵, Gihan Omar³ and Hazem E. Mohammed⁶, ¹Misr University For Science and Technology, ⁶ october, Al Jizah, Egypt, ²⁶th October University, ⁶ october, Al Jizah, Egypt, ³Faculty of Medicine, Misr University for Science and Technology, ⁶ october, Al Jizah, Egypt, ⁴Misr University For Science and Technology, Nasr City, Al Qahirah, Egypt, ⁵Al Azhar University, Cairo, Egypt, Giza, Al Jizah, Egypt, ⁶Faculty of Medicine, Assiut university, Assiut, Egypt, assyut, Asyut, Egypt

Meeting: ACR Convergence 2025

Keywords: Imaging, meta-analysis, Osteoarthritis, radiography, X-ray

Session Information

Date: Sunday, October 26, 2025

Title: (0306–0336) Osteoarthritis – Clinical Poster I

Session Type: Poster Session A

Session Time: 10:30AM-12:30PM

Background/Purpose: Artificial intelligence (AI) rapid advancement opens new opportunities in the field of rheumatology. With better imaging, AI may help find early osteoarthritic changes that would not have been otherwise detected and help physicians with the diagnosis of early stage of knee osteoarthritis (KOA). The early diagnosis and timely intervention ultimately results in more favorable outcomes for the patients. This study aims to assess the diagnostic accuracy of AI detection and classification of radiographic Kellgren–Lawrence (KL) grades for KOA.

Methods: We conducted a systematic search across PubMed, Web of Science, and Scopus databases, covering all available literature up to March 1st, 2025. Following the PRISMA guidelines (Figure 1), we screened and evaluated the methodological quality of the eligible studies, selecting only those deemed to be of high quality and excluding studies without outcome data or insufficient quality. We performed a meta-analysis to estimate pooled sensitivity, pooled specificity, and diagnostic likelihood ratios (LR+/LR-). All analyses were conducted using RStudio version 4.4.2.

Results: A total of 14 studies were included in the systematic review and meta-analysis, encompassing 45588 radiographs. As for KL grading, AI showed robust diagnostic performance across all grades. In KL Grade 0, specificity was 0.954 (95% CI: 0.877–0.984) and sensitivity was 0.829 (95% CI: 0.488–0.961) with Pre Test Probability (PP) of 30%, Post Test Probability +ve (PTP +ve) of 88%, and Post Test Probability -ve (PTP -ve) of 7%. For KL Grade 1, specificity remained high at 0.956 (95% CI: 0.880–0.984), with modest sensitivity of 0.680 (95% CI: 0.392–0.875) and (PP: 20%, PTP +ve: 79%, PTP -ve: 8%). For KL Grade 2, specificity was 0.937 (95% CI: 0.883–0.967), sensitivity was 0.850 (95% CI: 0.733–0.921) and (PP: 25%, PTP +ve: 82%, PTP -ve: 5%). KL Grade 3 yielded a specificity of 0.977 (95% CI: 0.937–0.992), sensitivity of 0.906 (95% CI: 0.793–0.961) and (PP: 15%, PTP +ve: 87%, PTP -ve: 2%). in KL Grade 4, specificity reached 0.995 (95% CI: 0.984–0.999), sensitivity 0.938 (95% CI: 0.800–0.983) and (PP: 10%, PTP +ve: 96%, PTP -ve: 1%).

Conclusion: This is the first systematic review and meta-analysis to evaluate the use of AI in the early diagnosis of KOA. AI demonstrates consistently good sensitivity and specificity across all KL grades. These findings support the use of AI as a valuable assistant for rheumatologists in the early diagnosis of KOA. However, further research is needed to focus on developing, training, and validating of a unified model capable of accurate KOA diagnosis, especially the early stages of the disease.

Figure (1) PRISMA Flow Chart Illustrating Search Strategy and Selection of Included Studies

Figure (2) (A) and (B):Forest Plots of Sensitivity and Specificity for KL0 and KL1, Se and Sp Forest Plot: Studies included: Panwar et al. 2024, Thomas et al. 2020, Yayli et al. 2025, Bany Muhammad et al. 2021. (C), (D) and (E) :Forest Plots of Se and Sp for KL2-4, Se Forest Plot: Studies included: Panwar et al. 2024, Thomas et al. 2020, Yayli et al. 2025, Bany Muhammad et al. 2021, and Yoon et al 2023. AI showed robust diagnostic performance across all grades.

CI, confidence interval; KL, Kellgren-Lawrence; TP, true positives; TN, true negatives; FP, false positives; FN, false negatives; Se, Sensitivity; Sp, specificity.

Table (1): Pooled Diagnostic Performance Metrics of AI for OA Stratified by KL Grade. The table summarizes sensitivity (Se), specificity (Sp), area under the curve (AUC), likelihood ratios (LR), diagnostic odds ratios (DOR), and post-test probabilities for AI-based classification of OA severity across KL grades (0–4). AI was superior to the index test in most metrics (Pooled Se , Sp, AUC. Positive likelihood ratio (LR+), post-test probability) and notably had a significant post-test probability in all KL Grades with DOR increase progressively with higher KL grades.

CI, confidence interval; KL, Kellgren-Lawrence.

Disclosures: M. Abdelsalam: None; H. Hafez: None; O. El Sedafy: None; N. Abouelella: None; A. Hamza: None; O. El-Sayed: None; M. Awad: None; G. Omar: None; H. Mohammed: None.

To cite this abstract in AMA style:

Abdelsalam M, Hafez H, El Sedafy O, Abouelella N, Hamza A, El-Sayed O, Awad M, Omar G, Mohammed H. Diagnostic Test Accuracy of Artificial Intelligence in Early Diagnosis of Osteoarthritis: A Systematic Review and Meta-Analysis of 45,588 Knees [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/diagnostic-test-accuracy-of-artificial-intelligence-in-early-diagnosis-of-osteoarthritis-a-systematic-review-and-meta-analysis-of-45588-knees/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/diagnostic-test-accuracy-of-artificial-intelligence-in-early-diagnosis-of-osteoarthritis-a-systematic-review-and-meta-analysis-of-45588-knees/