Predicting Osteoporosis Using Routine Clinical Data: A Machine Learning Approach

Shiza sarfraz and Hassam Ali, East Carolina University Brody School of Medicine, Greenville, NC

Meeting: ACR Convergence 2025

Keywords: Aging, Bone density, osteoporosis

Session Information

Date: Tuesday, October 28, 2025

Title: Abstracts: Osteoporosis & Metabolic Bone Disease – Basic & Clinical Science (2591–2596)

Session Type: Abstract Session

Session Time: 2:15PM-2:30PM

Background/Purpose: Dual-energy X-ray absorptiometry (DXA) is the gold standard for diagnosing osteoporosis but is underutilized due to access, cost, and referral barriers. We aimed to develop and evaluate a machine learning model to predict DXA-defined osteoporosis using only non-imaging clinical and lifestyle variables available in routine care.

Methods: Data were drawn from the 2021–2022 National Health and Nutrition Examination Survey (NHANES(. Adults ≥20 years with complete DXA data (femoral and spine bone mineral density [BMD]) and clinical/laboratory variables were included. Osteoporosis was defined as minimum BMD < 0.7 g/cm² at either site. Predictors included demographics (age, sex, race/ethnicity), anthropometrics (BMI, waist circumference), lifestyle (smoking, alcohol, physical activity), diet (calcium, vitamin D intake), biomarkers (serum vitamin D, CRP, cholesterol), and weight change. DXA variables were excluded from predictors. A random forest classifier was trained on a 70% split and evaluated on a 30% held-out test set. Logistic regression using the same inputs served as a comparator.

Results: Among 2,083 eligible adults, 90 (4.3%) met the DXA-defined osteoporosis threshold (minimum BMD < 0.7 g/cm² at spine or femur). The random forest model, trained without DXA data, achieved an AUC of 0.99 on the test set (n=625), with sensitivity of 81.5%, specificity of 100%, and F1-score of 0.90. Model accuracy was 99.2%, and all predicted positives were true positives (PPV: 100%). In contrast, logistic regression using the same features yielded an AUC of 0.87, but failed to reliably detect positive cases (sensitivity: 11.1%, F1-score: 0.18). The random forest model identified age, BMI, serum vitamin D, CRP, smoking history, physical activity, and calcium intake as the most influential predictors (Figure 2). Feature importance analysis revealed strong nonlinear effects of inflammatory and nutritional biomarkers, alongside classical osteoporosis risk factors. Notably, the model detected subtle combinations of risk even in participants without overt clinical suspicion. ROC and confusion matrix plots demonstrated excellent calibration and discrimination (Figure 1).

Conclusion: A random forest model using non-imaging data accurately identified individuals with DXA-defined osteoporosis, outperforming traditional regression approaches. This approach may assist in screening and triage, particularly in settings where DXA access is limited or underused.

Model Performance in Predicting DXA-Defined Osteoporosis. ROC curve (left) shows excellent discrimination (AUC = 0.99), and the confusion matrix (right) confirms strong sensitivity and perfect specificity using non-DXA features.

Figure 2.

Top Predictors of DXA-Defined Osteoporosis. Feature importance plot from the random forest model highlights age, BMI, vitamin D, CRP, smoking, and physical activity as leading contributors.

Disclosures: S. sarfraz: None; H. Ali: None.

To cite this abstract in AMA style:

sarfraz S, Ali H. Predicting Osteoporosis Using Routine Clinical Data: A Machine Learning Approach [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/predicting-osteoporosis-using-routine-clinical-data-a-machine-learning-approach/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/predicting-osteoporosis-using-routine-clinical-data-a-machine-learning-approach/