Emerging AI Tools in Psychiatry: Detecting Depression via Audio Analysis

An LLM-based model detected major depressive disorder from brief WhatsApp voice messages with reported >91% accuracy in women—if replicated and prospectively validated, this approach could support earlier identification of high-risk patients.
This study departs from clinic-bound voice-biomarker research by sampling natural messaging speech rather than structured recordings. Investigators analyzed short WhatsApp audio messages from adults and compared clinician-diagnosed depressed versus non-depressed profiles.
Outpatients and community volunteers provided short WhatsApp audio clips; the primary endpoint was clinician-diagnosed major depressive disorder. The top-performing LLM exceeded 91% accuracy in women. Models were trained on 86 participants and tested on an independent set of 74, with held-out evaluation across submodels—suggesting high screening sensitivity in women but a nontrivial false-positive rate that requires confirmatory clinical assessment.
Accuracy was higher in female participants than in males, plausibly reflecting greater female representation in the training data, sex-linked speech-pattern differences, and cultural or language factors. Key limitations include a skewed sample composition, confinement to Brazilian Portuguese, incomplete clinical-interview confirmation for some cases, and potential overfitting to female voice features; these constraints temper immediate clinical adoption without broader, balanced validation.
These results warrant cautious interpretation: prospective validation across diverse populations, equity testing, and governance frameworks are needed before such tools could be considered for routine clinical use.