Session Information
Session Type: Poster Session C
Session Time: 10:30AM-12:30PM
Background/Purpose: Imaging outcomes are often evaluated using longitudinal analysis based on scores from multiple readers. However, the input into the analysis can vary from the scores of multiple individual readers to using a consensus or average across multiple readers. We assessed if this choice at the analysis stage can result in different estimates of change, we performed longitudinal analyses on different imaging outcomes in patients with early axial SpA (axSpA), using both individual reader scores and consensus scores/averages of readers.
Methods: Patients with chronic back pain (³3 months; ≤2 years; onset < 45 years) from the SPondyloArthritis Caught Early cohort with a rheumatologist’s diagnosis of axSpA at the 2-year follow-up were included. MRIs and radiographs of the sacroiliac joints and spine were obtained at baseline, 3 months, 1 and 2 years of follow-up and afterwards scored for inflammatory and structural lesions by 3 central readers, except for the MRI spine (2 readers). Patients included in this analysis had ≥1 score from ≥1 reader at 2-year follow-up. Each outcome was analyzed per reader and also according to the consensus (³2 readers) or average across readers. Change over time was analyzed with generalized estimating equations, with ‘time’ as explanatory variable. For the analysis per individual reader, a multilevel analysis was performed, taking each individual reader into account. For the analysis of the consensus/average scores, each outcome was modelled according to its consensus/average score across readers.
Results: In total we analyzed 279 patients with axSpA (mean age 31 (SD 8) years; 53% males). As shown in Table 1, change estimates for continuous scores were very similar based on individual reader scores and using average scores across readers. However, larger discrepancies were found when comparing change estimates for dichotomous variables, expressed as yearly percentage change (Table 2).Overall, the difference between the analytical choices could go in both directions. Spinal change was underestimated using the consensus scores of dichotomous variables, while change scores of the dichotomous variables in SIJ were mostly overestimated, especially of variables including erosions. In dichotomous variables, percentual differences between change scores calculated according to both analytical methods ranged from -35.4 to 37.5%.
Conclusion: The seemingly simple decision of how to incorporate imaging outcomes from multiple readers into a model of change over time can significantly influence the final estimates, especially when dealing with dichotomous outcomes. To minimize the risk of introducing bias at the analytical stage, multilevel analyses that account for individual reader scores are recommended, rather than relying on pre-calculated consensus scores. This analysis also shows the vulnerability of dichotomous outcomes, reinforcing the disposition to use continuous measures as outcomes in analyses.
To cite this abstract in AMA style:
de Bruin L, van Gaalen F, de Hooge M, van Lunteren M, Marques M, Reijnierse M, Ramonda R, Berg I, Turesson C, Landewé R, Van Der Heijde D, Ramiro S. Longitudinal analysis on imaging outcomes: should we use the individual scores from multiple readers or rather the consensus or average of readers? [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/longitudinal-analysis-on-imaging-outcomes-should-we-use-the-individual-scores-from-multiple-readers-or-rather-the-consensus-or-average-of-readers/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/longitudinal-analysis-on-imaging-outcomes-should-we-use-the-individual-scores-from-multiple-readers-or-rather-the-consensus-or-average-of-readers/