Longitudinal analysis on imaging outcomes: should we use the individual scores from multiple readers or rather the consensus or average of readers?

Liese de Bruin¹, Floris A. van Gaalen¹, Manouk de Hooge², Miranda van Lunteren¹, Mary Lucy Marques³, Monique Reijnierse⁴, Roberta Ramonda⁵, Inger Jorid Berg⁶, Carl Turesson⁷, Robert Landewé⁸, Désirée Van Der Heijde¹ and Sofia Ramiro⁹, ¹Department of Rheumatology, Leiden University Medical Center, Leiden, Netherlands, ²Department of Rheumatology, Ghent University Hospital, Ghent, Belgium, ³Department of Rheumatology, Leiden University Medical Center, Leiden, Netherlands; and Coimbra Local Health Unit, Coimbra, Portugal, ⁴Department of Radiology, Leiden University Medical Center, Leiden, Netherlands, ⁵Rheumatology Unit-DIMED-University of Padova ITALY, Padova, Padua, Italy, ⁶⁶Center for treatment of Rheumatic and Musculoskeletal Diseases (REMEDY), Diakonhjemmet Hospital, Oslo, Nepal, ⁷Lund University, Malmö, Sweden, ⁸Department of Rheumatology, Amsterdam University Medical Center, Amsterdam, Netherlands; and Zuyderland Medical Center, Heerlen, Netherlands, ⁹Leiden University Medical Center, Bunde, Netherlands

Meeting: ACR Convergence 2025

Keywords: Imaging, longitudinal studies, spondyloarthritis, Statistical methods

Session Information

Date: Tuesday, October 28, 2025

Title: (1936–1971) Imaging of Rheumatic Diseases Poster

Session Type: Poster Session C

Session Time: 10:30AM-12:30PM

Background/Purpose: Imaging outcomes are often evaluated using longitudinal analysis based on scores from multiple readers. However, the input into the analysis can vary from the scores of multiple individual readers to using a consensus or average across multiple readers. We assessed if this choice at the analysis stage can result in different estimates of change, we performed longitudinal analyses on different imaging outcomes in patients with early axial SpA (axSpA), using both individual reader scores and consensus scores/averages of readers.

Methods: Patients with chronic back pain (³3 months; ≤2 years; onset < 45 years) from the SPondyloArthritis Caught Early cohort with a rheumatologist’s diagnosis of axSpA at the 2-year follow-up were included. MRIs and radiographs of the sacroiliac joints and spine were obtained at baseline, 3 months, 1 and 2 years of follow-up and afterwards scored for inflammatory and structural lesions by 3 central readers, except for the MRI spine (2 readers). Patients included in this analysis had ≥1 score from ≥1 reader at 2-year follow-up. Each outcome was analyzed per reader and also according to the consensus (³2 readers) or average across readers. Change over time was analyzed with generalized estimating equations, with ‘time’ as explanatory variable. For the analysis per individual reader, a multilevel analysis was performed, taking each individual reader into account. For the analysis of the consensus/average scores, each outcome was modelled according to its consensus/average score across readers.

Results: In total we analyzed 279 patients with axSpA (mean age 31 (SD 8) years; 53% males). As shown in Table 1, change estimates for continuous scores were very similar based on individual reader scores and using average scores across readers. However, larger discrepancies were found when comparing change estimates for dichotomous variables, expressed as yearly percentage change (Table 2).Overall, the difference between the analytical choices could go in both directions. Spinal change was underestimated using the consensus scores of dichotomous variables, while change scores of the dichotomous variables in SIJ were mostly overestimated, especially of variables including erosions. In dichotomous variables, percentual differences between change scores calculated according to both analytical methods ranged from -35.4 to 37.5%.

Conclusion: The seemingly simple decision of how to incorporate imaging outcomes from multiple readers into a model of change over time can significantly influence the final estimates, especially when dealing with dichotomous outcomes. To minimize the risk of introducing bias at the analytical stage, multilevel analyses that account for individual reader scores are recommended, rather than relying on pre-calculated consensus scores. This analysis also shows the vulnerability of dichotomous outcomes, reinforcing the disposition to use continuous measures as outcomes in analyses.

Disclosures: L. de Bruin: None; F. van Gaalen: AbbVie, 2, BMS, 2, Eli Lilly, 2, Jacobus Stichting, 5, MSD, 2, Novartis, 2, 5, Stichting ASAS, 5, Stichting vrienden van Sole Mio, 5, UCB, 5; M. de Hooge: UCB pharma, 2; M. van Lunteren: None; M. Marques: Novartis, 2, 6; M. Reijnierse: None; R. Ramonda: None; I. Berg: None; C. Turesson: AbbVie/Abbott, 1, 6, Nordic Drugs, 2, 6, Novartis, 1, 6; R. Landewé: AbbVie/Abbott, 2, Bristol-Myers Squibb(BMS), 2, Eli Lilly, 2, Janssen, 2, Joint Imaging BV, 12, Director, Novartis, 2, Pfizer, 2, Rheumatology Consultancy BV, 12, Director, UCB, 2; D. Van Der Heijde: AbbVie, 2, Alfasigma, 2, Annals of the Rheumatic Diseases, 12, Associate editor, ArgenX, 2, Bristol Myers Squibb, 2, Eli Lilly and Company, 2, Grey-Wolf Therapeutics, 2, Imaging Rheumatology BV, 12, Director, Janssen, 2, Journal of Rheumatology, 12, Editorial board member, Novartis, 2, Pfizer, 2, RMD Open, 12, Editoral board member, Takeda, 2, UCB, 2; S. Ramiro: AbbVie, 2, 5, Eli Lilly, 2, 5, Galapagos/Alfasigma, 2, 5, Janssen, 2, MSD, 2, 5, Novartis, 2, 5, Pfizer, 2, 5, Sanofi, 2, 5, UCB, 2, 5.

To cite this abstract in AMA style:

de Bruin L, van Gaalen F, de Hooge M, van Lunteren M, Marques M, Reijnierse M, Ramonda R, Berg I, Turesson C, Landewé R, Van Der Heijde D, Ramiro S. Longitudinal analysis on imaging outcomes: should we use the individual scores from multiple readers or rather the consensus or average of readers? [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/longitudinal-analysis-on-imaging-outcomes-should-we-use-the-individual-scores-from-multiple-readers-or-rather-the-consensus-or-average-of-readers/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/longitudinal-analysis-on-imaging-outcomes-should-we-use-the-individual-scores-from-multiple-readers-or-rather-the-consensus-or-average-of-readers/