Using Artificial Intelligence to Analyze Multilingual Qualitative Data in Lupus Pregnancy Research: A Proof of Concept with Large Language Models

Romina Boers¹, Grace Terry² and Bella Mehta³, ¹Weil Cornell Medicine, New York, ²Weil Cornell Medicine, Ne wYork, ³Hospital for Special Surgery, Weill Cornell Medicine, Jersey City, NJ

Meeting: ACR Convergence 2025

Keywords: pregnancy, Qualitative Research, quality of life, Systemic lupus erythematosus (SLE), Women's health

Session Information

Date: Sunday, October 26, 2025

Title: Abstracts: Patient Outcomes, Preferences, & Attitudes (0789–0794)

Session Type: Abstract Session

Session Time: 2:15PM-2:30PM

Background/Purpose: Women with systemic lupus erythematosus (SLE), particularly those of childbearing age, face heightened risks during pregnancy, including disease flares, adverse maternal-fetal outcomes, and emotional stress. Understanding how these women perceive pregnancy is essential for delivering personalized and empathetic care. While qualitative research is ideal for capturing such patient perspectives, analyzing interview data analyzing interview data is labor-intensive and often exclude non-English speakers. To evaluate the use of large language models (LLMs) in translating, analyzing, and summarizing qualitative interviews on pregnancy experiences in women with SLE, and to quantitatively compare LLM-generated themes with those identified by human researchers.

Methods: We used semi-structured interviews originally conducted in Portuguese by Rodrigues et al.(1) with 26 women with SLE to explore their prenatal care experiences. Transcripts were translated into English using LLMs, and three types of prompting—zero-shot, few-shot, and chain-of-thought—were tested across four state-of-the-art LLMs (GPT o3-mini-high, Claude 3.7 Sonnet, Gemini 2.0 Flash and DeepSeek R1) to generate key themes and supporting quotes. (Figure 1) Each output was evaluated using an LLM-as-a-judge framework on five dimensions/ metrics: theme accuracy, theme coverage, semantic similarity, factual consistency, and coherence, using the published reference themes as ground truth.

Results: LLM-derived themes included: (1) diagnostic uncertainty, (2) emotional and psychological burden, (3) pregnancy as a source of hope, (4) fear of harming the baby, (5) importance of support systems, (6) everyday adaptations, and (7) personal growth. These themes aligned well with those from the original qualitative study. (Table 1) Few-shot prompting with GPT-4 yielded the highest overall performance across all five evaluation metrics – with strong theme accuracy (7.5/10), coverage (7.0/10), and coherence (9.0/10). (Table 2) .

Conclusion: LLMs show strong potential for supporting qualitative research in rheumatology by facilitating multilingual data processing, theme extraction, and instrument development. This approach can significantly reduce the labor required for traditional qualitative analysis while preserving depth and rigor, especially in studies addressing sensitive and complex experiences such as lupus pregnancy. Lastly, these methods help amplify the voices of patients who are often underrepresented in research due to language barriers, making this a valuable tool for promoting equity.Ref 1) Larissa Rodrigues, Vera Lucia Pereira Alves, Maria Margarida Fialho Sim-Simc, Fernanda Garanhani Surita. Perceptions of women with systemic lupus erythematosus undergoing high-risk prenatal care: A qualitative study, Midwifery,Volume 87, 2020, 102715,https://doi.org/10.1016/j.midw.2020.102715.

Figure 1. Workflow for LLM-Assisted Multilingual Qualitative Analysis in Lupus Pregnancy Research

Table 1. Comparison of themes from the reference qualitative study with themes generated by ChatGPT 4.0. The left column shows human-derived themes from Rodrigues et al. (2020), while the right column presents corresponding LLM-derived themes. This alignment confirms that use of LLM can identify relevant themes from qualitative semi-structured interviews.

Table 2. Performance scores Across Prompting methods

Disclosures: R. Boers: None; G. Terry: None; B. Mehta: Amgen, 1, Horizon, 1.

To cite this abstract in AMA style:

Boers R, Terry G, Mehta B. Using Artificial Intelligence to Analyze Multilingual Qualitative Data in Lupus Pregnancy Research: A Proof of Concept with Large Language Models [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/using-artificial-intelligence-to-analyze-multilingual-qualitative-data-in-lupus-pregnancy-research-a-proof-of-concept-with-large-language-models/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/using-artificial-intelligence-to-analyze-multilingual-qualitative-data-in-lupus-pregnancy-research-a-proof-of-concept-with-large-language-models/