Generative AI Decision Support in Kenyan Primary Care Trial

06/29/2026

Key Takeaways

The intervention arm showed no statistically significant difference in expert-adjudicated treatment failure within 14 days.
Clinicians using the tool produced better diagnostic documentation and treatment plans in reviewed encounters.
Independent review identified no intervention-related safety signal, and exploratory analyses showed lower antibiotic-related spending.

In a pragmatic cluster-randomized trial across 16 Kenyan primary care facilities, an EMR-embedded generative AI decision support tool did not significantly change the prespecified 14-day treatment-failure endpoint. Treatment failure occurred in 2.2% of intervention encounters and 2.0% of control encounters, with clinicians in the control group using the same record system without the AI feature. The trial also found stronger documentation and treatment planning during routine visits, but those process gains were not matched by a measurable short-term difference in the primary clinical outcome.

This pragmatic, multicenter, parallel-group cluster-randomized controlled trial took place in 16 Penda Health primary care facilities in Nairobi and Kiambu counties, Kenya. Randomization occurred at the clinical-officer level, with patients nested within clinicians and facilities; 103 clinical officers were assigned, including 52 intervention and 51 control. The intervention used AI Consult 2.0, an LLM-based decision support feature embedded in the EMR, while control clinicians used the same EMR with the feature disabled. Researchers screened 17,626 patients, enrolled 9,691 between 22 April and 16 July 2025, and included 9,347 encounters in the primary analysis after 90 losses to follow-up and 254 exclusions related to protocol nonadherence or potential exposure or allocation misclassification. Follow-up telephone assessments occurred on days 3 and 14, and the primary follow-up window extended through 14 days after the index visit.

For the primary endpoint, 102 of 4,693 encounters in the intervention arm met treatment-failure criteria, compared with 94 of 4,654 in control. The adjusted odds ratio was 0.77, with a 95% confidence interval from 0.55 to 1.08, and the P value was 0.13. An additional analysis adjusted for encounter-specific variables and yielded a similar estimate, with an adjusted odds ratio of 0.72, a 95% confidence interval from 0.50 to 1.03, and P = 0.07. Within the reported 14-day window, the estimate remained imprecise and did not show a statistically significant between-group difference.

Among 2,000 reviewed encounters, clinicians in the intervention arm were more likely to record an appropriate diagnosis, a comprehensive note, and an appropriate treatment plan. The adjusted odds ratio was 1.74 for appropriate diagnosis, with a 95% confidence interval from 1.28 to 2.36, and 1.68 for comprehensive notes, with a 95% confidence interval from 1.24 to 2.27. Appropriate treatment planning also improved, with an adjusted odds ratio of 1.71, a 95% confidence interval from 1.25 to 2.34, and all three comparisons had P < 0.001. Correct antibiotic use did not clearly differ between groups, and patient satisfaction was similar, with a median score of 4.0 in both arms. The clearest measurable differences were in documentation quality and treatment planning.

No serious adverse events were judged related to the intervention, and independent review did not identify a safety signal. Across sites, 33 serious adverse events occurred, including 27 hospitalizations and 6 deaths, but reviewers found appropriate management and no causal link to the tool. In exploratory post hoc analyses, antibiotic spending averaged US$3.85 per patient in control and US$3.71 in intervention, for an adjusted mean difference of US$-0.15, with a 95% confidence interval from -0.25 to -0.04. The mean per-patient LLM cost was US$0.04, with a 95% confidence interval from 0.04 to 0.04. Over 14 days, the intervention was associated with process gains and lower antibiotic-related spending in exploratory analysis, but not with a statistically significant change in the prespecified clinical endpoint.

CME Learning Centers

CME/CE Topic Areas

Spotlight On:

Lifestyle

Trending Topics

Generative AI Decision Support in Kenyan Primary Care Trial

Program Chapters

Segment Chapters

Playlist:

Recommended

Generative AI Decision Support in Kenyan Primary Care Trial

Title

Program Chapters

Segment Chapters

Playlist:

Recommended

Get a Dose of ReachMD in Your Inbox and Practice Smarter Medicine