基于机器学习算法构建慢性萎缩性胃炎风险预测模型
DOI:
作者:
作者单位:

1.南京医科大学附属无锡人民医院消化内科;2.南京医科大学公共卫生学院

作者简介:

通讯作者:

中图分类号:

基金项目:

无锡市重大项目(Z202208),南京医科大学无锡医学中心队列项目(WMCC202302),南京医科大学无锡医学中心重大项目(WMCM202304).


Construction of a risk prediction model for chronic atrophic gastritis based on machine learning algorithm
Author:
Affiliation:

Department of Gastroenterology, the affiliated Wuxi People''s Hospital of Nanjing Medical University , Wuxi Medical Center, Nanjing Medical University

Fund Project:

Major Project of Wuxi City (Z202208), Cohort Project of Nanjing Medical University Wuxi Medical Center (WMCC202302), Major Project of Nanjing Medical University Wuxi Medical Center (WMCM202304).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 文章评论
    摘要:

    目的:慢性萎缩性胃炎(chronic atrophic gastritis,CAG)是胃癌前疾病,CAG的早期诊断对于预防胃癌(gastric cancer,GC)的发生意义重大。本研究拟建立并验证一种基于机器学习的模型,用于预测发生CAG的可能性。 方法:本研究回顾性纳入1268名参与GC筛查队列的患者。通过问卷调查、血清学检测、上消化道内镜检查和病理活检获取研究数据。首先通过单因素逻辑(Logistic)回归,Boruta算法和最小绝对收缩和选择算子(LASSO)回归方法进行CAG特征筛选。其次采用8种机器学习算法,即Logistic,SVM,GBM,神经网络(NeuralNetwork),Xgboost,Adaboost,LightGBM,CatBoost,使用5折交叉验证法训练并开发CAG预测机器学习模型。通过多个指标评估模型性能,包括使用包括AUC、校准曲线、决策曲线、特异度、灵敏度等各种指标评估了这些模型的性能。最后利用SHAP分析对模型的每个特征及其决策依据进行解释,以提高该模型预测CAG的可读性。 结果:本研究通过单因素Logistic回归、Boruta算法和LASSO回归三种方法共同识别出幽门螺杆菌(HP)感染、胃蛋白酶原比值(PGR)、年龄、吸烟史、性别及胃癌家族史为CAG重要预测因素。进一步分析对比发现,Logistic回归模型在测试集中AUC为0.805(0.762?0.849),灵敏度为79%,特异度为70.8%,性能不弱于其他7种机器学习模型。最后利用SHAP方法识别出HP感染、PGR和年龄是影响机器学习模型预测CAG的主要核心因素。 结论:基于人口学和临床因素的机器学习算法能准确预测CAG的发生概率,HP感染、PGR和年龄是预测CAG的主要核心因素,而吸烟史、性别及胃癌家族史则可提升CAG的发病风险。这将有助于临床实践中早期发现和诊断CAG。

    Abstract:

    Objective: Chronic atrophic gastritis (CAG) is a precancerous condition of gastric cancer (GC), and screening for CAG is critical for preventing GC. This study aims to establish and validate a machine learning-based model to predict the likelihood of developing CAG. Methods: This retrospective study enrolled 1,268 participants from a GC screening cohort. Data were collected through questionnaires, serological tests, upper gastrointestinal endoscopy, and pathological biopsies. Feature selection was performed using univariate logistic regression, Boruta algorithm, and LASSO regression. Eight machine learning algorithms—Logistic Regression, SVM, GBM, Neural Network, XGBoost, AdaBoost, LightGBM, and CatBoost—were trained and developed using 5-fold cross-validation. Model performance was evaluated using multiple metrics, including AUC, calibration curves, decision curves, specificity, and sensitivity. SHAP analysis was applied to interpret feature contributions and decision logic, enhancing the model’s interpretability. Results: Three methods (univariate logistic regression, Boruta algorithm, and LASSO regression) collectively identified Helicobacter pylori (HP) infection, pepsinogen ratio (PGR), age, smoking history, sex, and family history of GC as key predictors of CAG. The Logistic Regression model achieved an AUC of 0.805 (0.762–0.849) in the test set, with a sensitivity of 79% and specificity of 70.8%, performing comparably to the other seven machine learning models. SHAP analysis further highlighted HP infection, PGR, and age as the most influential features in predicting CAG. Conclusion: Machine learning algorithms based on demographic and clinical factors can accurately predict the probability of CAG occurrence. H. pylori infection, PGR, and age are identified as the core predictive factors, while smoking history, gender, and family history of gastric cancer may increase the risk of developing CAG. This approach could facilitate early detection and diagnosis of CAG in clinical practice. Keywords: Chronic atrophic gastritis; Precancerous condition; Screening; Machine learning algorithms; Predictive model

    参考文献
    相似文献
    引证文献
引用本文

韩经略,颜财旺,胡非凡,等.基于机器学习算法构建慢性萎缩性胃炎风险预测模型[J].中华消化内镜杂志,2026,43(2).

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-01-21
  • 最后修改日期:2026-02-09
  • 录用日期:2025-04-28
  • 在线发布日期: 2026-02-12
  • 出版日期:
您是第位访问者

通信地址:南京市鼓楼区紫竹林3号《中华消化内镜杂志》编辑部   邮编:210003

中华消化内镜杂志 ® 2026 版权所有
技术支持:北京勤云科技发展有限公司