Session Information
Session Type: Poster Session C
Session Time: 10:30AM-12:30PM
Background/Purpose: To evaluate the agreement of an artificial intelligence (AI) model designed to assess greyscale and Doppler synovitis severity and osteophyte severity in hand joints compared to human expert raters, using a consensus score as the gold standard.
Methods: Ultrasound images of metacarpophalangeal (MCP), proximal interphalangeal (PIP), distal interphalangeal (DIP), and interphalangeal (IP) joints were collected from patients with hand pain. Rheumatologists, all EULAR-certified ultrasound instructors, scored the images for synovial hypertrophy (SH) (5 raters), Doppler activity (3 raters), and osteophyte severity (4 raters) on a scale from 0 to 3 using the Global OMERACT-EULAR Synovitis Score (GLOESS) and the corresponding osteophyte scoring system. The AI model was trained, validated, and tested on 7314 images. The disease classifications of the AI model were tested against the raters on 1280 ultrasound images to assess SH, 840 ultrasound videos to assess Doppler activity and 351 ultrasound images to assess osteophytes. The agreement with the consensus was calculated as the AI’s average agreement with all raters. Performance metrics, including Cohen’s Kappa, Percent Exact Agreement (PEA), Percent Close Agreement (PCA), sensitivity, specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV), were calculated with 95% confidence intervals (CI).
Results: As illustrated in Figure 1, the AI and human raters achieved comparable results across all metrics.
SH: The AI vs. consensus showed a Kappa of 0.39 (95% CI: 0.35–0.44), PEA of 51.77% (95% CI: 48.83–54.70%), PCA of 91.03% (95% CI: 89.21–92.63%), sensitivity of 46.19% (95% CI: 39.13–53.32%), and specificity of 90.43% (95% CI: 88.35–92.25%).
Doppler Activity: The AI vs. consensus had a Kappa of 0.61 (95% CI: 0.54–0.67), PEA of 80.49% (95% CI: 77.51–83.22%), PCA of 97.13% (95% CI: 95.69–98.18%), sensitivity of 67.31% (95% CI: 51.86–80.24%), and specificity of 96.29% (95% CI: 94.65–97.52%).
Osteophyte Grading: The AI vs. consensus showed a Kappa of 0.55 (95% CI: 0.46–0.63), PEA of 70.69% (95% CI: 65.57–75.45%), PCA of 96.28% (95% CI: 93.70–98.01%), sensitivity of 56.43% (95% CI: 31.56–73.36%), and specificity of 95.36% (95% CI: 92.44–97.36%).
These metrics, along with overlapping 95% confidence intervals depicted in Figure 1, indicate that the AI’s performance is comparable to that of experienced human raters across all metrics.
Conclusion: The AI model performed at the level of expert human raters in assessing synovial hypertrophy, Doppler activity, and osteophyte severity in hand joints. This suggests that AI can be a reliable tool for evaluating joint ultrasound images, potentially aiding clinical decision-making by providing consistent and standardized assessments.
To cite this abstract in AMA style:
Weber A, Ammitzbøll Danielsen M, Aplin Frederiksen B, Berner Hammer H, Schultz Overgaard B, Terslev L, Rajeeth Savarimuthu T, Just S. Performance of an Artificial Intelligence Model Compared to Multiple Human Experts in Scoring Synovitis Severity and Osteophyte Severity on Joint Ultrasound Images [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/performance-of-an-artificial-intelligence-model-compared-to-multiple-human-experts-in-scoring-synovitis-severity-and-osteophyte-severity-on-joint-ultrasound-images/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/performance-of-an-artificial-intelligence-model-compared-to-multiple-human-experts-in-scoring-synovitis-severity-and-osteophyte-severity-on-joint-ultrasound-images/