Use of machine learning models to predict Barrett esophagus and esophageal adenocarcinoma risk

Dec. 12, 2023

Over the past 40 years, the incidence of esophageal adenocarcinoma (EAC) in the Western hemisphere has increased by approximately sixfold. Often diagnosed after the onset of obstructive symptoms, the mortality rate associated with this form of cancer is high.

Barrett esophagus (BE) is the precursor of most EACs. Patients with BE have chronic reflux that induces metaplastic changes in the distal esophageal epithelium. Risk factors for BE include age > 50 years, male sex, white race, chronic reflux, obesity, smoking, and family history of BE or EAC.

Most gastroenterology societies suggest screening for BE in individuals with multiple risk factors, followed by endoscopic surveillance and endoscopic treatment of dysplasia. Despite these recommendations, however, BE screening rates are quite low.

"We now have access to novel, minimally invasive nonendoscopic tools for Barrett's esophagus screening," says gastroenterologist Prasad G. Iyer, M.D., M.Sc., a clinician and researcher at Mayo Clinic in Rochester, Minnesota. "However, there is a critical need for developing Barrett's esophagus and esophageal cancer risk assessment tools that are more accurate and can be easily implemented using electronic health record data."

To address this challenge, Dr. Iyer and a team of researchers sought to develop and test a machine learning-powered risk prediction algorithm and tool for BE and EAC based on variables obtained from a deidentified large database of electronic health records (EHRs). The results from this research were published in Clinical and Translational Gastroenterology in 2023, with Dr. Iyer serving as lead author.

Study methods

Dr. Iyer and colleagues used an ensemble transformer-based machine learning (ML) model architecture developed on a deidentified EHR database of 6 million Mayo Clinic patients to create predictive models that determine BE and EAC risk at least a year before diagnosis. Additional features of the risk prediction tool include the following:

  • Automatic incorporation of data points available in the EHR and expansion of the current risk factor pool.
  • Integration of the tool within the EHR so that the EHR prompts the care team to consider BE screening when clinically appropriate.

The researchers identified 8,476 individuals with BE, 1,539 with EAC and 252,276 in the control group using International Classification of Diseases (ICD) codes and augmented curation (natural language processing) techniques applied to clinical, endoscopy, laboratory and pathology notes. Each patient with BE or EAC received a propensity score matched to five independent randomly selected control groups. An ensemble transformer-based ML model architecture then developed predictive models.


  • The BE risk prediction model had an overall sensitivity, specificity and area under the receiver-operating curve (AUROC) of 76%, 76% and 0.84, respectively.
  • The EAC risk prediction model had an overall sensitivity, specificity and AUROC of 84%, 70% and 0.84, respectively.
  • The model identified conventional risk factors for BE and EAC, and additional novel factors such as coronary artery disease, serum triglycerides and electrolytes.

"Our work has demonstrated the feasibility of creating a more accurate BE and EAC risk assessment tool from the EHR using machine learning," explains Dr. Iyer. "This model could be integrated into the EHR and combined with a nonendoscopic screening tool deployed in primary care."

Dr. Iyer notes that more research is needed to clarify the threshold at which screening should be recommended and additional details.

"Testing the ML model in patients and assessing the performance of the model in identifying its positive and negative predictive value are important next steps," says Dr. Iyer. "We will be working with Dr. John Kisiel on these next steps."

John B. Kisiel, M.D., is a Mayo Clinic gastroenterologist and researcher at Mayo Clinic in Rochester, Minnesota.

For more information

Iyer PG, et al. Development of electronic health record-based machine learning models to predict Barrett's esophagus and esophageal adenocarcinoma risk. Clinical and Translational Gastroenterology. 2023;14:e00637.

Refer a patient to Mayo Clinic.