Limits of AI for Skin Cancer Diagnosis in Real-World Settings

Key Takeaways
- A modern foundation model outperformed physicians with less than 3 years of experience and matched those with 3 to 10 years of experience in skin lesion diagnosis.
- Expert dermatologists with more than 10 years of experience achieved the highest diagnostic accuracy and outperformed all AI models.
The 1117-case dataset included clinical and dermoscopic images along with associated metadata and was designed to reflect the variability encountered in routine practice, including rare and atypical lesions. Researchers compared performance among a first-generation convolutional neural network (CNN), two foundation models (PanDerm unimodal and multimodal), and physician readers with varying levels of dermatology experience.
Diagnostic performance varied across both AI models and physician experience levels. Expert physicians with more than 10 years of experience achieved the highest mean diagnostic accuracy at 74.2%, outperforming all AI systems evaluated in the study. The PanDerm unimodal foundation model achieved 72.2% accuracy, exceeding the performance of physicians with less than 3 years of experience, who achieved 68.2% accuracy on average. The multimodal foundation model achieved 66.3% accuracy, while the CNN achieved 56.7%, a level below that of all physician groups included in the analysis.
The most advanced foundation model surpassed physicians with less than 3 years of experience and matched physicians with 3 to 10 years of experience, but remained inferior to dermatologists with more than 10 years of experience.