New Research Shows Potential Shortcomings in the Use of AI to Read Mammograms

Artificial intelligence continues to be used in new ways as the technology develops. Among its uses is in health care, as a diagnostic aid. A new study shows, though, that there’s more work to be done when it’s used for breast cancer diagnosis.

Research recently published in Radiology, the journal of the Radiological Society of North America, investigated how a U.S. Food and Drug Administration-approved AI algorithm interpreted negative digital breast tomosynthesis, or 3D mammography, screenings. The goal was to see if a person’s characteristics like race or age could impact how well the algorithm performed.


Dr. Derek Nguyen, the study’s first author and assistant professor at Duke University, says, “AI has become a resource for radiologists to improve their efficiency and accuracy in reading screening mammograms while mitigating reader burnout. However, the impact of patient characteristics on AI performance has not been well studied.”

Dr. Nguyen and his team used negative 3D mammography screenings from just under 5,000 patients who were screened from 2016 through 2019. All remained cancer-free for the following two years, so their negative results were reliable. The AI algorithm looked at each of these screenings and gave them scores for cancer certainty and risk of subsequent cancer.

The results showed that the algorithm had worse performance for some demographics over others. False positives were more likely to occur in Black patients and less likely to occur in Asian patients, compared with white patients. It was also less reliable for those aged 61 to 80, compared with their younger counterparts. Extremely dense breasts were more apt to lead to a false positive, as well.


Though the researchers point out that other studies have suggested AI algorithms may help radiologists with breast cancer detection, this current study shows that a patient’s characteristics could impact how reliable this help is.

Dr. Nguyen says, “There are few demographically diverse databases for AI algorithm training, and the FDA does not require diverse datasets for validation. Because of the differences among patient populations, it’s important to investigate whether AI software can accommodate and perform at the same level for different patient ages, races and ethnicities.”

He adds, “This study is important because it highlights that any AI software purchased by a healthcare institution may not perform equally across all patient ages, races/ethnicities and breast densities. Moving forward, I think AI software upgrades should focus on ensuring demographic diversity.”

You can read the whole study here.

People, Pets & Planet

Help where it’s needed most at GreaterGood for free!