Session Information
Session Type: Poster Session C
Session Time: 10:30AM-12:30PM
Background/Purpose: In research, imaging findings are often assessed by multiple readers and individual readers’ scores are combined into aggregate scores to determine the presence of lesions. In longitudinal studies, the focus shifts to change in lesions. Diverse strategies for aggregating change scores are at hand, but may yield different results. We aimed to establish a methodological framework for aggregating change scores, using the example of syndesmophyte progression in axial spondyloarthritis (axSpA).
Methods: Data from the Sensitive Imaging in Ankylosing Spondylitis cohort were used, including patients with axSpA and established spinal damage. Syndesmophytes were assessed on vertebral corners (hereafter ‘level’) on conventional radiography (CR) and low-dose computed tomography (ldCT) of the spine at baseline and 2 years. CR were scored by 3 central readers and ldCT by 2, blinded for time order. Consensus was determined by majority reader agreement. New syndesmophytes (change score) were assessed by 2 strategies. Strategy 1 calculates change score per reader, then determines consensus on the change per level. Strategy 2 derives consensus change scores from consensus status scores (syndesmophyte present/absent) (Box 1). The means and standard deviations (SD) of the different aggregate change scores were compared. For analyses at the patient level, the total number of new syndesmophytes was averaged across readers, only if patients had ≤25% scores missing per spinal segment at both timepoints.
Results: Complete data from 52 patients were used. In example 3 (Figure 1a), the two strategies yielded different results. When consensus change scores are derived from change scores of individual readers (strategy 1), because only reader C detected a new syndesmophyte, there is no new syndesmophyte in the aggregated score. In contrast, when consensus status scores are determined first (strategy 2), indicating no syndesmophyte at baseline but one at 2 years, the consensus change score suggests a new syndesmophyte.Strategy 2 is more sensitive, identifying 1.2 times more new syndesmophytes on CR (mean [SD]: 0.86 [1.00] vs 0.69 [0.95]) and 3.1 times more on ldCT (7.23 [7.46] vs 2.35 [4.44]). However, this is accompanied with higher variability (Figure 1b). Only strategy 1 represents “true consensus” among readers, but requires site-level agreement, which compromises sensitivity to change at the patient level. At the patient level, the total number of new syndesmophytes should be calculated as the average of the total number across all readers (CR: 0.66 [1.10]; ldCT: 5.55 [6.65]).
Conclusion: Consensus change scores should be derived from individual readers’ change scores. Strategy 1 approaches true change best, while strategy 2 artificially inflates change. At the patient level, the average total number of new syndesmophytes across readers should be reported. While we focus on syndesmophyte progression in axSpA, this framework is broadly applicable to imaging findings across multiple diseases. Advanced statistical analyses beyond descriptive purposes should preferably be performed on individual readers’ scores, using multilevel models that properly account for variability in reader assessment.
Box 1. Explanation of two strategies for defining a consensus change score in multi-reader imaging assessments
Figure 1. Comparison of strategies for assessing new syndesmophytes in patients with axial spondyloarthritis after 2 years, illustrating a broad imaging framework for consensus change scores
To cite this abstract in AMA style:
Bento da Silva A, Ramiro S, van Gaalen F, Landewé R, van Lunteren M, de Bruin L, Ayan G, Baraliakos X, Reijnierse M, Braun J, Van Der Heijde D, de Hooge M. How Calculating Consensus Change Scores Can Go Wrong: Lessons from Multi-reader Imaging Assessments in Axial Spondyloarthritis [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/how-calculating-consensus-change-scores-can-go-wrong-lessons-from-multi-reader-imaging-assessments-in-axial-spondyloarthritis/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/how-calculating-consensus-change-scores-can-go-wrong-lessons-from-multi-reader-imaging-assessments-in-axial-spondyloarthritis/