基于多标签分类的胃镜检查完整度评估人工智能模型的构建与验证
作者:
作者单位:

1.天津医科大学总医院消化内科;2.天津市御锦人工智能医疗科技有限公司;3.中山大学附属第六医院消化内镜科;4.天津医科大学总医院消化科

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(82370545);天津市科技计划项目(24ZYCGSY00900)


Construction and validation of a multi‑label classification artificial intelligence model for assessing the completeness of gastroscopy
Author:
Affiliation:

Department of Gastroenterology and Hepatology, Tianjin Medical University General Hospital

Fund Project:

National Natural Science Foundation of China (82370545); Tianjin Science and Technology Plan Project (24ZYCGSY00900)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 文章评论
    摘要:

    目的 评估利用ResNeSt‑50网络构建的多标签分类模型(下文称为Endosmart)在胃镜检查部位识别中的应用价值。方法 收集天津医科大学总医院空港医院及天津市南开医院2018年1月至2022年7月间的胃镜检查图片10 172张,包括24个胃部位及13 617个标签,其中8 501张用于深度学习构建胃镜部位识别多标签分类模型Endosmart,实现输入的胃镜图片同时输出多个解剖部位的多标签预测;使用1 671张胃镜图片进行内部验证;收集天津医科大学总医院2022年8月至2024年12月间的100例胃镜检查视频片段作为外部验证;分别评估Endosmart在内部图片验证集及外部视频验证集中对胃各部位识别的精确度、召回率和F1分数;并在视频验证集中,对比该模型与4名中年资内镜医师的平均识别效能。结果 在内部图片验证集中,Endosmart识别胃各部位的平均精确度为94.2%(95%CI:92.6%~95.5%),召回率为90.3% (95%CI:88.7%~91.7%),F1分数为86.2%(95%CI:84.2%~88.2%);在外部视频验证集中,Endosmart对胃各部位识别的精确度、召回率及宏平均F1分数分别为95.5%(95%CI:94.0%~96.9%)、93.7%(95%CI:92.4%~95.0%)和94.5% (95%CI:93.3%~95.6%) ;4名中年资内镜医师对胃各部位整体识别的平均精确度为95.5%(95%CI:93.9%~96.9%),召回率为90.7%(95%CI:89.1%~91.8%),宏平均F1分数为91.7%(95%CI:90.3%~93.9%)。Endosmart对胃各部位整体识别的微平均F1分数为94.6%(95%CI:93.5%~95.6%),4名中年资内镜医师为92.1%(95%CI:90.6%~93.5%),二者识别效能差异具有统计学意义(P0.001)。结论 利用ResNeSt‑50网络构建的Endosmart多标签分类模型在胃镜图像解剖部位识别中表现良好,识别效能优于中年资内镜医师。该模型可作为胃镜检查的辅助工具,用于检测胃镜检查部位的完整性,降低盲点率,提高胃镜检查质量。

    Abstract:

    Objective To evaluate the application value of a multi‑label classification model (Endosmart) based on ResNeSt‑50 network for gastroscopic site recognition. Methods A total of 10 172 gastroscopic images involving 24 gastric regions and 13 617 labels were enrolled from Tianjin Medical University General Hospital Airport Hospital and Tianjin Nankai Hospital from January 2018 to July 2022. Of these, 8 501 images were used to develop the Endosmart via deep learning, enabling simultaneous multi‑label prediction of multiple anatomical sites from a single gastroscopic image. Internal validation was performed using 1 671 gastroscopic images. External validation was conducted using 100 gastroscopic video clips from Tianjin Medical University General Hospital from August 2022 to December 2024. The precision, recall, and F1‑score of Endosmart for gastric site recognition were assessed in both the internal image and external video validation sets. The overall recognition performance of Endosmart was further compared with those of 4 mid‑career endoscopists in the video validation set. Results In the internal image validation set, the Endosmart achieved an average precision of 94.2% (95%CI: 92.6%‑95.5%), a recall of 90.3% (95%CI: 88.7%‑91.7%), and an F1‑score of 86.2% (95%CI: 84.2%‑88.2%) for gastric anatomical site recognition at the image level. In the external video validation set, the precision, recall and macro‑average F1‑score of Endosmart for comprehensive recognition of gastric anatomical sites were 95.5% (95%CI: 94.0%‑96.9%), 93.7% (95%CI: 92.4%‑95.0%) and 94.5% (95%CI: 93.3%‑95.6%), respectively. While the 4 mid‑career endoscopists yielded an average precision of 95.5% (95%CI: 93.9%‑96.9%), a recall of 90.7% (95%CI: 89.1%‑91.8%), and a macro-average F1-score of 91.7% (95%CI: 90.3%-93.9%). In the gastric anatomical site recognition task, the per-video micro F1-score of the Endosmart was 94.6% (95%CI: 93.5%-95.6%), compared with the average level of 92.1% (95%CI: 90.6%-93.5%) among the 4 endoscopists. The difference in recognition performance between the model and endoscopists was statistically significant (P0.001). Conclusion The ResNeSt-50-based Endosmart multi-label classification model demonstrates excellent performance in the recognition of anatomical sites from gastroscopic images and videos, and shows superior accuracy to those of mid-career endoscopists. This model can serve as an auxiliary tool for gastroscopy to monitor examination completeness, reduce blind spots, and improve examination quality.

    参考文献
    相似文献
    引证文献
引用本文

林琳,姚双喆,宋岩,等.基于多标签分类的胃镜检查完整度评估人工智能模型的构建与验证[J].中华消化内镜杂志,2026,43(5):365-373.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-03-25
  • 最后修改日期:2026-05-09
  • 录用日期:2025-06-03
  • 在线发布日期: 2026-05-13
  • 出版日期:
您是第位访问者

通信地址:南京市鼓楼区紫竹林3号《中华消化内镜杂志》编辑部   邮编:210003

中华消化内镜杂志 ® 2026 版权所有
技术支持:北京勤云科技发展有限公司