Abstract:Objective: Chronic atrophic gastritis (CAG) is a precancerous condition of gastric cancer (GC), and screening for CAG is critical for preventing GC. This study aims to establish and validate a machine learning-based model to predict the likelihood of developing CAG. Methods: This retrospective study enrolled 1,268 participants from a GC screening cohort. Data were collected through questionnaires, serological tests, upper gastrointestinal endoscopy, and pathological biopsies. Feature selection was performed using univariate logistic regression, Boruta algorithm, and LASSO regression. Eight machine learning algorithms—Logistic Regression, SVM, GBM, Neural Network, XGBoost, AdaBoost, LightGBM, and CatBoost—were trained and developed using 5-fold cross-validation. Model performance was evaluated using multiple metrics, including AUC, calibration curves, decision curves, specificity, and sensitivity. SHAP analysis was applied to interpret feature contributions and decision logic, enhancing the model’s interpretability. Results: Three methods (univariate logistic regression, Boruta algorithm, and LASSO regression) collectively identified Helicobacter pylori (HP) infection, pepsinogen ratio (PGR), age, smoking history, sex, and family history of GC as key predictors of CAG. The Logistic Regression model achieved an AUC of 0.805 (0.762–0.849) in the test set, with a sensitivity of 79% and specificity of 70.8%, performing comparably to the other seven machine learning models. SHAP analysis further highlighted HP infection, PGR, and age as the most influential features in predicting CAG. Conclusion: Machine learning algorithms based on demographic and clinical factors can accurately predict the probability of CAG occurrence. H. pylori infection, PGR, and age are identified as the core predictive factors, while smoking history, gender, and family history of gastric cancer may increase the risk of developing CAG. This approach could facilitate early detection and diagnosis of CAG in clinical practice. Keywords: Chronic atrophic gastritis; Precancerous condition; Screening; Machine learning algorithms; Predictive model