Abstract
Importance: Compares the responses of four AI models to common nephrology-related questions encountered in clinical settings. Objective: To evaluate generative AI models in enhancing nephrology patient communication and education. Design: Generative AI in Nephrology Setting: In a study conducted from December 8–12, 2023, and October 21–23, 2024, IT engineers evaluated GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 for nephrology patient communication and education, querying each with 21 nephrology questions and three renal biopsy reports, repeated for consistency. Intervention(s) (for clinical trials) or Exposure(s) (for observational studies): None. Main Outcome(s) and Measure(s): Fifteen nephrologists and one nephrology researcher assessed responses for Appropriateness, Helpfulness, Consistency, and human-like empathy, with rating scale (1–4). Using Shapiro–Wilk and Mann–Whitney U tests with Holm correction, along with TF-IDF, BertScore, and ROUGE were used. The study compared the performance of GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 across 24 nephrology-related questions. Results: GPT-4o consistently achieved high scores in Appropriateness (3.39 ± 0.7) and Helpfulness (3.24 ± 0.73), while PaLM 2 demonstrated the highest consistency score (3.0 ± 0.86). In empathy, GPT-4 achieved the highest overall score (80.73%), excelling in patient-centric scenarios, followed by GPT-4o (76.56%). PaLM 2 showed competitive empathy in specific cases, despite scoring lower in consistency and Appropriateness. For Kidney-Related Queries, GPT-4o excelled in relevance metrics, achieving the highest BertScore (0.57) and ROUGE for one-word metrics (0.54). Gemini 1.0 Ultra led in generating coherent responses for Renal Biopsy Reports with the highest TF-IDF (0.56) and ROUGE for longest similar sentences (0.47). All 101 references provided by GPT-4 were 100% accurate. Conclusions and Relevance: GPT-4o emerged as the most accurate and consistent model across most evaluation categories, while GPT-4 demonstrated superior empathy and balanced performance. PaLM 2 and Gemini 1.0 Ultra showed strengths in specific areas, highlighting the potential for tailored applications of generative AI in nephrology clinical practice.
| Original language | English |
|---|---|
| Article number | 20552076251342067 |
| Journal | Digital Health |
| Volume | 11 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Gemini 1.0 Ultra
- Generative AI
- GPT-4
- GPT-4o
- nephrology
- PaLM 2
ASJC Scopus subject areas
- Health Policy
- Health Informatics
- Computer Science Applications
- Health Information Management