MARS-ED Trial: Machine Learning Risk Stratification In The ED

05/26/2026

Key Takeaways

Prognostic discrimination with RISKINDEX was at least as high as the clinician intuition questions and higher than the standard scores examined in this trial.
Treatment plans changed in 1 of 644 intervention patients, and admission, length of stay, and ICU use were similar between groups.
No intervention-related adverse events were reported, and in a subgroup clinicians gave the score a low added-value rating.

In the MARS-ED trial, a machine-learning score showed strong prognostic discrimination, with RISKINDEX reaching an AUROC of 0.84 (95% CI 0.78–0.90), but it had little effect on bedside decisions. Adults evaluated by internal medicine specialists at Maastricht University Medical Center+ were randomized to standard care with or without access to the score. Physicians saw the score during routine emergency department care after their initial assessment and preliminary plan. Management rarely changed, and admission patterns, length of stay, and ICU use were similar between groups.

This investigator-initiated, open-label, randomized, non-inferiority trial was conducted in a single emergency department in the Netherlands. Adults aged 18 years or older were eligible if they were evaluated by an internal medicine specialist and had at least four laboratory tests available. Computer-generated permuted blocks assigned 1303 participants, with a median age of 69 years, to standard care plus score access or standard care alone, including 644 intervention and 659 control patients. The score incorporated routine laboratory values, age, and sex, and physicians viewed it after a complete assessment and preliminary plan; primary outcomes were 31-day mortality accuracy and clinical impact.

Thirty-one-day mortality was 6.9%, with 90 deaths overall and 45 in each group. RISKINDEX achieved an AUROC of 0.84 (95% CI 0.78–0.90), compared with 0.74 for concern, 0.76 for severity, and 0.73 for surprise. Precision-recall performance also favored the score, with an AUPRC of 0.33 versus 0.20 to 0.22 for intuition questions. It also exceeded NEWS, APACHE II, and SOFA, whose AUROCs ranged from 0.65 to 0.75.

Across the cohort, 62.5% of patients were admitted, but access to the RISKINDEX changed treatment plans in only 1 of 644 intervention patients (0.16%). Hospital admission, length of stay, and ICU admission were similar between groups. The score matched physician expectations in 46.8% of cases, was higher than expected in 26.6%, and lower than expected in 24.6%. In a subgroup, physicians reported a median added-value score of 2 (IQR 1–4), and less experienced clinicians had lower intuition performance, while the score's prognostic advantage did not produce measurable workflow impact.

No adverse events related to the intervention were reported. Blinding was not possible because physicians needed to view the score during care. The study was not powered for most clinical endpoints, and its single-center convenience sample may limit generalizability. Overall, better mortality prediction performance was not accompanied by detectable clinical or operational benefit in this randomized emergency department trial.

CME Learning Centers

CME/CE Topic Areas

Spotlight On:

Lifestyle

Trending Topics

MARS-ED Trial: Machine Learning Risk Stratification In The ED

Program Chapters

Segment Chapters

Playlist:

Recommended

MARS-ED Trial: Machine Learning Risk Stratification In The ED

Title

Program Chapters

Segment Chapters

Playlist:

Recommended

Get a Dose of ReachMD in Your Inbox and Practice Smarter Medicine