基于深度学习的门静脉高压性胃病识别与分级系统的开发与验证
作者:
作者单位:

1.武汉大学人民医院消化内科;2.武汉楚精灵医疗科技有限公司;3.中国人民解放军南部战区总医院消化内科;4.湖南中医药高等专科学校附属第一医院消化内科;5.竹山县人民医院消化内科;6.武汉大学人民医院消化内镜中心

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划(2023YFC2507400)


Development and validation of a recognition and classification system for portal hypertensive gastropathy based on deep learning
Author:
Affiliation:

Department of Gastroenterology, Renmin Hospital of Wuhan University

Fund Project:

National Key Research and Development Program of China (2023YFC2507400)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 文章评论
    摘要:

    目的 开发一款基于深度学习的门静脉高压性胃病(portal hypertensive gastropathy,PHG)实时辅助识别与分级系统并评估其辅助初级内镜医师判断内镜下PHG特征的效果。方法 回顾性收集2015年1月至2023年10月武汉大学人民医院、武汉市中西医结合医院、荆州市第二人民医院的消化内镜中心数据库中832例肝硬化患者的2 848张胃镜图片。本系统参考BavenoⅡ评分系统的3个内镜特征:胃窦毛细血管扩张症(gastric antral vascular ectasia,GAVE)、马赛克图案(mosaic‑like pattern,MLP)及红色征(red marks,RM),分别开发了3个模型,具体分类指代如下:(1)GAVE模型:0无、1有;(2)MLP模型:0无、1轻度、2重度;(3)RM模型:0无、1孤立、2融合。以3名内镜专家对内镜下PHG特征的分类结果作为金标准。使用yolov8‑m模型训练,以8∶1∶1分配训练集、验证集和测试集。测试集用于测试模型性能和模型辅助初级内镜医师诊断PHG内镜特征的效果,计算准确率、召回率、精确率、特异度和Kappa系数。结果 GAVE模型的准确率、召回率、特异度为96.0%(48/50)、87.5%(7/8)、97.6%(41/42),其准确率与金标准相比,差异无统计学意义(χ2=316.226,P=1.000)。GAVE1与GAVE0的精确率分别为87.5%(7/8)和97.6%(41/42)。MLP模型的准确率为84.1%(132/157),与金标准相比,差异无统计学意义(χ2=3.286,P=0.193);MLP2的精确率和召回率分别为88.2%(15/17)和75.0%(15/20),MLP1的精确率和召回率为77.9%(60/77)和88.2%(60/68),MLP0的精确率和召回率为90.5%(57/63)和82.6%(57/69)。RM模型的准确率为87.9%(123/140),与金标准相比,差异无统计学意义(χ2=2.891,P=0.409);RM2的精确率和召回率为94.7%(18/19)和78.3%(18/23),RM1的精确率和召回率为72.2%(26/36)和81.3%(26/32),RM0的精确率和召回率为92.9%(79/85)和92.9%(79/85)。3名初级医师的平均准确率,在GAVE模型、MLP模型、RM模型辅助前后,分别从95.3%升至99.3%、从83.9%升至91.9%、从81.9%升至83.1%。3名初级医师整体与金标准的一致性分析提示,GAVE模型辅助前后的一致性均极强(整体Kappa均为1.000);MLP模型由辅助前的中等(整体Kappa=0.601)升至辅助后的极强(整体Kappa=0.964);RM模型辅助前(整体Kappa=0.792)后(整体Kappa=0.798)均较强。结论 本系统能对胃镜下PHG的内镜特征准确识别并分级,可提升初级内镜医师诊断PHG的效能。

    Abstract:

    Objective To develop a deep learning-based system for real-time recognition and classification of portal hypertensive gastropathy (PHG) and evaluate its ability to assist junior endoscopists. Methods A total of 2 848 gastroscopy images from 832 patients with liver cirrhosis were selected from Digestive Endoscopy Center databases of Renmin Hospital of Wuhan University, Wuhan Hospital of Traditional Chinese and Western Medicine, and the Second Hospital of Jingzhou from January 2015 to October 2023. This system referred to 3 endoscopic features of Baveno Ⅱ scoring system. Three models were developed respectively for gastric antral vascular ectasia (GAVE), mosaic-like pattern (MLP), and red marks (RM). The specific classification references were as follows: (1) GAVE model: 0 no, 1 yes; (2) MLP model: 0 no, 1 mild, 2 severe; (3) RM model: 0 no, 1 isolated, 2 fused. The classification results for endoscopic characteristics of PHG of 3 endoscopy experts were taken as the gold standard. The yolov8-m model was used for training. The training dataset, validation dataset, and test dataset were allocated at a ratio of 8∶1∶1. The test dataset was used to evaluate the performance of models and their auxiliary effects on endoscopists. The accuracy, recall, precision, specificity and Kappa coefficient were calculated. Results The accuracy, recall, specificity of GAVE model were 96.0% (48/50), 87.5% (7/8) and 97.6% (41/42). There was no significant difference between its accuracy and the gold standard (χ2=316.226, P=1.000). The precision of GAVE1 and GAVE0 were 87.5% (7/8) and 97.6% (41/42) respectively. The accuracy of MLP model was 84.1% (132/157), and there was no significant difference compared with the gold standard (χ2=3.286, P=0.193). The precision and recall of MLP2 were 88.2% (15/17) and 75.0% (15/20). The precision and recall of MLP1 were 77.9% (60/77) and 88.2% (60/68). The precision and recall of MLP0 were 90.5% (57/63) and 82.6% (57/69). The accuracy of RM model was 87.9% (123/140), and there was no significant difference compared with the gold standard (χ2=2.891, P=0.409). The precision and recall of RM2 were 94.7% (18/19) and 78.3% (18/23). The precision and recall of RM1 were 72.2% (26/36) and 81.3% (26/32). The precision and recall of RM0 were 92.9% (79/85) and 92.9% (79/85). The mean accuracy of the three junior endoscopists, with and without the assistance of the GAVE model, MLP model, and RM model, respectively increased from 95.3% to 99.3%, from 83.9% to 91.9%, and from 81.9% to 83.1%. The overall consistency analysis of the 3 junior endoscopists with the gold standard indicated that the consistency of the GAVE model before and after assistance was extremely strong (both an overall Kappa of 1.000); the consistency before assistance of the MLP model was moderate (with an overall Kappa of 0.601), which increased to extremely strong after assistance (with an overall Kappa of 0.964); and the consistency of the RM model before and after assistance was also relatively strong (with an overall Kappa of 0.792 before and 0.798 after). Conclusion The deep learning system accurately identifies and classifies PHG features and significantly enhances diagnostic performance of junior endoscopists.

    参考文献
    相似文献
    引证文献
引用本文

古皓文,杨杰,肖勇,等.基于深度学习的门静脉高压性胃病识别与分级系统的开发与验证[J].中华消化内镜杂志,2025,42(10):789-795.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-05-08
  • 最后修改日期:2025-10-17
  • 录用日期:2024-05-11
  • 在线发布日期: 2025-10-20
  • 出版日期:
您是第位访问者

通信地址:南京市鼓楼区紫竹林3号《中华消化内镜杂志》编辑部   邮编:210003

中华消化内镜杂志 ® 2026 版权所有
技术支持:北京勤云科技发展有限公司