Advertisement
Research Article| Volume 58, P74-81, March 2023

Developing an explainable machine learning model to predict the mechanical ventilation duration of patients with ARDS in intensive care units

  • Author Footnotes
    1 Zichen Wang and Luming Zhang contributed equally to the study.
    Zichen Wang
    Footnotes
    1 Zichen Wang and Luming Zhang contributed equally to the study.
    Affiliations
    Department of Intensive Care Unit, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China

    Department of Public Health, University of California, Irvine, Irvine, California, United State
    Search for articles by this author
  • Author Footnotes
    1 Zichen Wang and Luming Zhang contributed equally to the study.
    Luming Zhang
    Footnotes
    1 Zichen Wang and Luming Zhang contributed equally to the study.
    Affiliations
    Department of Intensive Care Unit, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China

    Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China
    Search for articles by this author
  • Tao Huang
    Affiliations
    Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China
    Search for articles by this author
  • Rui Yang
    Affiliations
    Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China

    School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi, China
    Search for articles by this author
  • Hongtao Cheng
    Affiliations
    School of Nursing, Jinan University, Guangzhou, China
    Search for articles by this author
  • Hao Wang
    Affiliations
    Department of Statistics, Iowa State University, Ames, Iowa, Unite States
    Search for articles by this author
  • Haiyan Yin
    Correspondence
    Corresponding authors.
    Affiliations
    Department of Intensive Care Unit, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China
    Search for articles by this author
  • Jun Lyu
    Correspondence
    Corresponding authors.
    Affiliations
    Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, China

    Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, Guangdong, China
    Search for articles by this author
  • Author Footnotes
    1 Zichen Wang and Luming Zhang contributed equally to the study.
Published:November 21, 2022DOI:https://doi.org/10.1016/j.hrtlng.2022.11.005

      Highlights

      • Machine learning algorithms can accurate predict the MV duration for ARDS patients in intensive care units.
      • XGBoosting model showed best performance in external test datasets from United State and Netherlands
      • LIME, SHAP and Breakdown were conducted for model explanation which is easier for clinicians to understand.

      Abstract

      Background

      Acute respiratory distress syndrome (ARDS) is common in intensive care units with high mortality rate and mechanical ventilation (MV) is the most important related treatment. Early prediction of MV duration has benefit for patients risk stratification and care strategies support.

      Objective

      To develop an explainable model for predicting mechanical ventilation (MV) duration in patients with ARDS using the machine learning (ML) approach.

      Method

      The number of 1,148, 1,697, and 29 ARDS patients admitted to intensive care units (ICU) in the MIMIC-IV, eICU-CRD, and AmsterdamUMCdb databases were included in the study. Features at MV initiation from the MIMIC-IV dataset were used to train prediction models based on seven supervised machine learning algorithms. After 5-fold cross-validation for hyperparameters tuning, the hyperparameters- optimized model of different algorithms was tested by external datasets extracted from eICU-CRD and Amsterdamumcdb. Finally, three descriptive machine learning explanation methods were conducted for the model explanation.

      Result

      The XGBoosting model showed the most stable and accurate performance among two testing datasets (RMSE= 5.57 and 5.46 days in eICU-CRD and AmsterdamUMCdb) and was selected as the optimal model. The model explanation based on SHAP, LIME, and DALEX results showed a consistent result, vasopressor, PH, and SOFA score had the highest effect on MV duration prediction.

      Conclusion

      ML models with features at MV initiation can accurate predict MV duration in patients with ARDS in ICUs. Among seven algorithms, XGB models showed the best performance (RMSE= 5.57 and 5.46 in two external datasets). LIME, SHAP, and Breakdown methods showed good performance as AXI methods.

      Keywords

      Introduction

      Acute respiratory distress syndrome (ARDS) is a diffuse lung disease caused by inflammatory damage to pulmonary capillary endothelial and alveolar epithelial cells during severe infection, shock, trauma, and burns, which can lead to acute hypoxic respiratory insufficiency or failure.

      Rubenfeld GD, Caldwell E, Peabody E, Weaver J, Martin DP, Neff M, et al. Incidence and Outcomes of Acute Lung Injury From the Division of Pulmonary and Criti-cal Care Medicine (G [Internet]. Vol. 16, n engl j med. 2005. Available from: www.nejm.org

      ,
      • Nieman GF
      • Andrews P
      • Satalin J
      • Wilcox K
      • Kollisch-Singule M
      • Madden M
      • et al.
      Acute lung injury: how to stabilize a broken lung.
      Globally, 30–47% of patients in intensive care units (ICUs) are diagnosed with ARDS, and the mortality rate ranges from 35% to 46%.
      • Bellani G
      • Laffey JG
      • Pham T
      • Fan E
      • Brochard L
      • Esteban A
      • et al.
      Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries.
      In addition to the treatment of primary disease, the primary goal for patients with ARDS is to correct hypoxemia, in which mechanical ventilation (MV) is the most important method of respiratory support.
      • Bein T
      • Grasso S
      • Moerer O
      • Quintel M
      • Guerin C
      • Deja M
      • et al.
      The standard of care of patients with ARDS: ventilatory settings and rescue therapies for refractory hypoxemia.
      ,

      Papazian L, Aubron C, Brochard L, Chiche JD, Combes A, Dreyfuss D, Forel JM, Guérin C, Jaber S, Mekontso-Dessap A, Mercat A, Richard JC, Roux D, Vieillard-Baron A, Faure H. Formal guidelines: management of acute respiratory distress syndrome. Ann Intensive Care. 2019 Jun 13;9(1):69. https://doi.org/10.1186/s13613-019-0540-9. PMID: 31197492; PMCID: PMC6565761.

      Although treatment interventions are beneficial to patients with ARDS, a prolonged MV time will not only extend the ICU stay and increase the treatment cost, but also increase the risk of pneumonia caused by conditional pathogens, resulting in a poor prognosis.
      • Ayzac L
      • Girard R
      • Baboi L
      • Beuret P
      • Rabilloud M
      • Richard JC
      • et al.
      Ventilator-associated pneumonia in ARDS patients: the impact of prone positioning. A secondary analysis of the PROSEVA trial.
      ,
      • Bice T
      • Cox CE
      • Carson SS.
      Cost and health care utilization in ARDS–different from other critical illness?.
      Moreover, if the ventilator is used improperly, ventilator-induced lung injury will further worsen the lung condition of ARDS patients and may lead to systemic organ failure.
      • Curley GF
      • Laffey JG
      • Zhang H
      • Slutsky AS.
      Biotrauma and ventilator-induced lung injury: clinical implications.
      Early prediction of MV duration is also essential for clinical decisions and care strategies, since it affects the timing of tracheostomy,

      Terragni PP, Antonelli M, Fumagalli R, Faggiano C, Berardino M, Pallavicini FB, et al. Early vs late tracheotomy for prevention of Pneumonia in mechanically ventilated adult ICU patients a randomized controlled trial [Internet]. Available from: https://jamanetwork.com/

      initiation of nutrition,
      • Kreymann KG
      • Berger MM
      • Deutz NEP
      • Hiesmayr M
      • Jolliet P
      • Kazandjiev G
      • et al.
      ESPEN guidelines on enteral nutrition: intensive care.
      intensive glycemic control use,
      • van den Berghe G
      • Wilmer A
      • Milants I
      • Wouters PJ
      • Bouckaert B
      • Bruyninckx F
      • et al.
      Intensive insulin therapy in mixed medical/surgical intensive care units: benefit versus harm.
      or transfer to other long-term ventilation units.
      • Carpenè N
      • Vagheggini G
      • Panait E
      • Gabbrielli L
      • Ambrosino N.
      A proposal of a new model for long-term weaning: respiratory intensive care unit and weaning center.
      Intensivists therefore tend to predict MV duration for risk stratification and ICU management. However, the current evidence is inadequate for the accuracy of intensivists making early predictions of MV duration,
      • Figueroa-Casas JB
      • Connery SM
      • Montoya R
      • Dwivedi AK
      • Lee S.
      Accuracy of early prediction of duration of mechanical ventilation by intensivists.
      indicating the importance of developing accurate and objective tools for predicting MV duration. With the development of computer power, machine learning (ML)—as a subset of artificial intelligence combined with statistical analysis using computer science—is being widely used in critical care and has impressive performance.
      • Gutierrez G.
      Artificial intelligence in the intensive care unit.
      We therefore aimed to collect the early features of patients with ARDS in ICUs and develop models based on multiple ML algorithms to predict MV duration.

      Method

      Data source and setting

      All data were extracted using Structured Query Language from the Medical Information Mart for Intensive Care (MIMIC)-IV database (version 1.0), eICU Collaborative Research Database (eICU-CRD version 2.0), and AmsterdamUMCdb (Version 1.0.2). The MIMIC-IV is a single center database that contains over 40,000 ICU admissions from 2008 to 2019 at the Beth Israel Deaconess Medical Center. Like MIMIC-IV, eICU-CRD contains electronic medical records of over 200,000 patients admitted to the ICU among 208 hospitals in the United States between 2014 and 2015.
      • Wu WT
      • Li YJ
      • Feng AZ
      • Li L
      • Huang T
      • Xu AD
      • Lyu J.
      Data mining in clinical big data: the frequently used databases, steps, and methodological models.
      ,
      • Yang J
      • Li Y
      • Liu Q
      • Li L
      • Feng A
      • Wang T
      • et al.
      Brief introduction of medical database and data mining technology in big data era.
      The AmsterdamUMCdb is the first European public critical care database including over 20,000 admissions from a mixed surgical-medical critical care medical center in Amsterdam University Medical Centers. All patients in above mentioned databases were de-identified identities following the Health Insurance Portability and Accountability Act (HIPAA) and European General Data Protection Regulation (EGDPR)

      Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. entific Data.

      • Pollard TJ
      • Johnson AEW
      • Raffa JD
      • Celi LA
      • Mark RG
      • Badawi O.
      The eICU collaborative research database, a freely available multi-center database for critical care research.
      • Thoral PJ
      • Peppink JM
      • Driessen RH
      • Sijbrands EJG
      • Kompanje EJO
      • Kaplan L
      • et al.
      Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: the Amsterdam university medical centers database (AmsterdamUMCdb) example∗.
      . The author completed related online health data safety training and applications before accessing the three databases.

      Study population and feature selection

      The diagnostic criteria of ARDS in the present study were based on the International Classification of Diseases (ICD)-9/10 code with adjustments according to the Berlin definition [1. Partial pressure of oxygen (PaO2)/ Fraction of inspired oxygen (FiO2) ratio <300 mmHg, 2. Positive end-expiratory pressure (PEEP) ≥5, 3. Bilateral infiltrates on chest radiograph].
      • Force* TADT
      Acute respiratory distress syndrome: the Berlin definition.
      The bilateral infiltrates were confirmed by searching keywords ‘edema’ OR (‘bilateral’ AND ‘infiltrate’) of free-text notes from radiology reports.
      • Pollard TJ
      • Johnson AEW
      • Raffa JD
      • Celi LA
      • Mark RG
      • Badawi O.
      The eICU collaborative research database, a freely available multi-center database for critical care research.
      ,
      The Lancet Respiratory Medicine
      Opening the black box of machine learning.
      ,

      Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 2017 May 22; Available from: http://arxiv.org/abs/1705.07874

      As a result, 1,193, 1,697, and 29 ARDS patients who met the diagnostic criteria from MIMIC-IV, eICU-CRD, and AmsterdamUMCdb were included in this study, respectively. After excluding patients aged <18 years (N=9 from eICU-CRD) and patients missing MV records (N=45/461 from MIMIC-IV and eICU-CRD), 1148, 1697, and 29 ARDS patients were eventually included.
      Feature selection was based on previous research and the experience of our clinical experts
      • Force* TADT
      Acute respiratory distress syndrome: the Berlin definition.
      • Leisman DE
      • Harhay MO
      • Lederer DJ
      • Abramson M
      • Adjei AA
      • Bakker J
      • Ballas ZK
      • Barreiro E
      • Bell SC
      • Bellomo R
      • Bernstein JA
      • Branson RD
      • Brusasco V
      • Chalmers JD
      • Chokroverty S
      • Citerio G
      • Collop NA
      • Cooke CR
      • Crapo JD
      • Donaldson G
      • Fitzgerald DA
      • Grainger E
      • Hale L
      • Herth FJ
      • Kochanek PM
      • Marks G
      • Moorman JR
      • Ost DE
      • Schatz M
      • Sheikh A
      • Smyth AR
      • Stewart I
      • Stewart PW
      • Swenson ER
      • Szymusiak R
      • Teboul JL
      • Vincent JL
      • Wedzicha JA
      • Maslove DM.
      Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals.
      The Lancet Respiratory Medicine
      Opening the black box of machine learning.
      • Ribeiro MT
      • Singh S
      • Guestrin C.
      Why should I trust you?” Explaining the predictions of any classifier.
      . All features were extracted at the MV initiation and the missing rate for extracted features from three datasets was shown in (eFig. 1). Features with a missing rate over 30% in any dataset (Mean Airway Pressure, Tidal Volume, and Temperature) or features not available in all datasets (APS-III, APACHE-IV, and APACHE-II) were dropped from potential features. To simplify the complexity of the prediction model and avoid overfitting, we use Lasso regression to filter the features.
      • Leisman DE
      • Harhay MO
      • Lederer DJ
      • Abramson M
      • Adjei AA
      • Bakker J
      • Ballas ZK
      • Barreiro E
      • Bell SC
      • Bellomo R
      • Bernstein JA
      • Branson RD
      • Brusasco V
      • Chalmers JD
      • Chokroverty S
      • Citerio G
      • Collop NA
      • Cooke CR
      • Crapo JD
      • Donaldson G
      • Fitzgerald DA
      • Grainger E
      • Hale L
      • Herth FJ
      • Kochanek PM
      • Marks G
      • Moorman JR
      • Ost DE
      • Schatz M
      • Sheikh A
      • Smyth AR
      • Stewart I
      • Stewart PW
      • Swenson ER
      • Szymusiak R
      • Teboul JL
      • Vincent JL
      • Wedzicha JA
      • Maslove DM.
      Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals.
      Interestingly, all features were retained based on λ of minimum mean cross-validated error (eFig. 2), which indicates that all fitted features were essential for the dependent variable (MV duration) prediction (eTable.1). Finally, age, weight, Sequential Organ Failure Assessment (SOFA) score, PaO2, FiO2, pressure of carbon dioxide (PCO2), PEEP, pH, heart rate, mean arterial pressure, pressure (MAP), vasopressor use, and renal replacement therapy (RRT) use were included.
      Fig 1
      Fig. 1The flow chart of machine learning model construction.
      Fig 2
      Fig. 2The distribution of mechanical ventilation duration among three databases A: The distribution density plot of MV duration among MIMIC-IV, eICU-CRD and AmsterdamUMCdb dataset; B: The violin plot of MV duration among MIMIC-IV, eICU-CRD and AmsterdamUMCdb dataset; The blox plot represent quantiles of MV duration and the colored shadow represent distribution density. The axis of MV duration were scale by log10.
      Table 1The clinical characteristic of patients between three databases.
      MIMIC-IV (N=1,148)eICU-CRD (N=1,697)Amsterdamumcdb (N=29)p-value
      MV Duration (Day)4.7 (2.4,9.6)2.0 (2.0,2.0)9.5 (6.8,12.4)<0.001
      Age (Year)63 (51,73)59 (41,70)9.5 (6.8,12.4)<0.001
      Weight (Kg)81.0 (67.7,97.6)83.2 (67.7,102.3)77.0 (66.0,86.0)<0.001
      SOFA Score9810<0.001
      PEEP (cm H2O)558<0.001
      SpO2 (%)97.0 (94.0,100.0)96.0 (93.0,99.0)93.0 (90.0,97.0)<0.001
      PaO2 (mm Hg)83.0 (57.0,170.3)81.7 (66.1,144.0)85.0 (66.0,107.0)0.683
      FiO2 (%)70.0 (50.0,100.0)60 (40.1,100.0)30.0 (20.0,40.0)<0.001
      PaCO2 (mm Hg)43.0 (36.0,52.0)42.0 (35.8,51.0)41.0 (34.0,48.0)0.031
      pH7.3 (7.3,7.4)7.4 (7.3,7.4)7.4 (7.2,7.5)<0.001
      Heart Rate (/min)93.0 (80.0,109.0)96.0 (81.0,112.0)105.0 (99.0,124.0)<0.001
      Mean Arterial Pressure (mm Hg)74.5 (65.0,85.0)78.0 (68.0,92.0)85.0 (76.0,117)<0.001
      Vasopressor Use<0.001
      No821 (71.5)1189 (70.1)7 (24.1)
      Yes327 (28.5)508 (29.90)22 (75.9)
      Renal Replacement Therapy0.017
      No917 (79.9)1423 (83.4)22 (75.9)
      Yes231 (20.1)274 (16.6)7 (24.1)
      Features and characteristics were represented by median (interquartile range) for continuous variables, and as count (percentage) for categorical data.
      The p-value for continuous features were calculated by Kruskal-Wallis test and p-value for categorical features were calculated Chi-square test.

      Machine learning models construction and hyperparameter tuning

      The MIMIC-IV dataset (n=1,148) was assigned as training cohort due to higher data integrity, and datasets of eICU-CRD (n=1,697) and AmsterdamUMCdb (n=29) were selected as external testing cohorts without any data overlap. Average values replaced missing values in each dataset. Seven supervised ML algorithms [Support vector Machine (Linear Kernel) (SVM-L), Support Vector Machine (Radial Basis Function Kernel) (SVM-R), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGboosting) (XGB), Neural Network (NNET) and k-Nearest Neighbors (KNN)] selected to build the training cohorts. The primary assessment of prediction performance was the root-mean-square error (RMSE) in the ML regression model. (Equation. 1)
      RMSE=i=1N(PredictediActuali)N
      (1)


      The hyperparameter tuning for each algorithms model were optimized by 5-fold cross-validated grid-search which means that the dataset was divided into 5 parts (4 were used for training, and one was used for 5 runs of testing) (eFig. 3), the hyperparameters among each ML algorithms with the best predictive performance (least RMSE) after cross-validation were fitted and the performance of different algorithm-based models were compared in the testing cohort and explained (Fig. 1).
      Fig 3
      Fig. 3Residual diagnostics analysis for mechanical ventilation duration prediction among machine learning algorithm models, A, C: The absolute residual distribution across prediction models in eICU-CRD and AmsterdamUMCdb external testing; B, D: The box plot of absolute residual distribution across prediction models in eICU-CRD and AmsterdamUMCdb external testing; The red dot represented mean absolute residual; Residual is defined as the difference between actual value and the prediction.
      Algorithm development improves the complexity of the model, such as in the ensemble or deep learning models, which further complicates the interpretation of the model. Therefore, opening the ‘black box’ of MV is crucial since it allow clinicians to easily understand the internal logic of each prediction.
      The Lancet Respiratory Medicine
      Opening the black box of machine learning.
      In response to this problem, Ribeiro et al
      • Ribeiro MT
      • Singh S
      • Guestrin C.
      Why should I trust you?” Explaining the predictions of any classifier.
      developed Local Interpretable Model-agnostic Explanations (LIME) method for local variable importance, Lundberg and Lee

      Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 2017 May 22; Available from: http://arxiv.org/abs/1705.07874

      proposed the SHapley Additive exPlanations (SHAP) for local variable attribution based on the logic of game theory. Similar like SHAP, The moDel Agnostic Language for Exploration and eXplanation (DALEX) was a comprehensive packaged algorithms systems based on the principle of the Breakdown and helps in calculating local and global feature importance.
      • Biecek P.law.
      DALEX: Explainers for Complex Predictive Models in R.
      LIME, SHAP and Breakdown methods were currently popular Explainable Artificial Intelligence (XAI) and were conducted for model interpretation (Fig. 1) supporting clinicians without algorithms backgrounds to better understanding the models.
      The ‘glmnet’ package constructed the Lasso regression for feature selection, machine learning algorithms and hyperparameters tuning were built by the ‘caret’ package, and XAI methods were conducted by ‘DALEX’ and ‘LIME’ packages. Features and characteristics were represented by median (interquartile range) for continuous variables, and as count (percentage) for categorical data. Continuous variables were compared by the Kruskal-Wallis test, and the Chi-square test compared categorical variables. A two-sided p-value of <0.05 was considered statistically significant. All statistical analyses were performed using the R Project for Statistical Computing (version 4.0.1) environment.

      Results

      The clinical characteristics of features between the three datasets were listed in (Table. 1). The univariable test showed that all features had significant differences between three datasets except PaO2, which indicated that the patient baselines of the three databases have significant heterogeneity. The (Fig. 2) demonstrated the distribution of MV duration of ARDS patients among three datasets. The distribution of MV duration in eICU-CRD patients was extremely concentrated between 1 and 10 days, while the distribution of MV duration in AmsterdamUMCdb was close to ten days. The MIMIC-IV dataset had the most averaged MV duration distribution. The AmsterdamUMCdb had the longest median MV duration, and eICU-CRD had the lowest median MV duration.
      The final hyperparameters of models resulted from 5-fold cross-validation were shown in (eTable. 2). The predicted performance among the cross-validation shown in (Table. 2) demonstrated that RF, SVM-R, and XGB had the top-3 predictive power among ML algorithms with RMSE (SD) equal to 7.22 (0.90), 7.23 (0.90) and 7.34 (0.91), respectively.
      Table 2Prediction performance for mechanical ventilation duration among machine learning algorithms of cross-validation.
      AlgorithmRMSE±SD
      Support vector Machine (Linear Kernel)7.36±0.92
      Support vector Machine (Radial Basis Function Kernel)7.23±0.90
      Decision Tree7.45±0.95
      Random forest7.22±0.90
      XGboosting7.34±0.91
      Neural Network9.62±0.93
      k-Nearest Neighbors7.41±0.99
      RMSE: Root mean square error; SD: Standard deviation.
      RMSE and SD were calculated from the result of 5-fold cross-validation.
      The performance of hyperparameters- optimized models was verified using the testing cohort (Table. 3) and the distribution of predicted MV duration among seven algorithms in training and testing datasets were presented in (eFig. 4). The NNET model showed the lowest RMSE (1.59) in the eICU-CRD dataset, however also showed the highest RMSE (9.92) in AmsterdamUMCdb which may explained by the concentrated prediction value between 0-5 days across all dataset which suggested poor prediction power. The SVM-L model showed the second-best performance (RMSE= 4.39) in the eICU-CRD dataset, while also the second-highest RMSE (6.46) for the AmsterdamUMCdb dataset. Compared to other models, the XGB mode had the most balanced prediction performance (RMSE= 5.57 and 5.46 for eICU-CRD and AmsterdamUMCdb dataset). The results of residual diagnostics across eICU-CRD and AmsterdamUMCdb prediction were shown in (Fig. 3). In the eICU-CRD testing cohort, the NNET model had the lowest absolute residual distribution while the highest absolute residual distribution in the AmsterdamUMCdb cohort. In contrast, the RF model had the lowest absolute residual distribution in AmsterdamUMCdb and the highest absolute residual distribution in eICU-CRD. Although SVM-L, SVM-R, and XGB had similar absolute residual distribution in eICU-CRD, white XGB had the second-lowest mean and median absolute residual distribution in the AmsterdamUMCdb dataset. Therefore, the XGB model was selected as the optimal prediction model.
      Table 3Prediction performance for mechanical ventilation duration among machine learning algorithms of external testing.
      AlgorithmTesting Cohort (RMSE)
      eICUAmsterdamUMCdb
      Support vector Machine (Linear Kernel)4.396.46
      Support vector Machine (Radial Basis Function Kernel)5.226.14
      Decision tree5.655.94
      Neural Network1.599.92
      Random forest6.485.43
      k-Nearest Neighbors5.576.03
      XGboosting5.575.46
      RMSE: Root mean square error.
      Fig 4
      Fig. 4Feature importance and model explanation for the optimal model, A: The feature importance ranking; The loss function was RMSE. B: The SHapley Additive exPlanations (SHAP) result of a single ARDS patients from eICU-CRD dataset; Green bar represented positive prediction and red bar represented negative prediction;C: The Local Interpretable Model-agnostic Explanations (LIME) result of a single ARDS patients from eICU-CRD dataset; Blue bar represented positive prediction and red bar represented negative prediction; D: The Breakdown result of a single ARDS patients from eICU-CRD dataset; Green bar represented positive prediction and red bar represented negative prediction; Numbers near each bar represented how features increase/decrease final prediction value.
      To investigate how each variable in the MV model affects the outcome prediction, we performed AXI on the optimal model (XGB) (Fig. 4). The top ten most important features calculated by the loss function of RMSE were listed in (Fig. 4 A). In addition, the model interpretations of the XGB model predictions based on LIME, SHAP, and Breakdown methods for a single patient in the eICU-CRD testing dataset were shown in (Fig. 4. B-D). The SHAP interpretation indicated that in the prediction of this patient, received vasopressor, SOFA score = 9 and PaO2 = 233 were the three most important features for MV duration prediction, which was consistent with the Breakdown method that received vasopressor and SOFA score= 9 made a positive prediction to MV duration. On the other hand, according to the LIME method, PH< 7.28 and not receiving RRT were the most important features that decreased the MV duration prediction, consistent with SHAP and Breakdown explanation.

      Discussion

      Our results indicate optimal model based on XGB was more effective in predicting MV persistence among seven algorithms with stable and accuracy prediction performance in two external datasets (RMSE= 5.57 and 5.46 days in eICU-CRD and AmsterdamUMCdb, respectively). Some readily available clinical features collected at MV initiation can accurately predict MV duration, which is very convenient for clinicians in formulating treatment plans. Three model interpretation methods clearly explained how feature affected the prediction of MV duration for ARDS patients and vasopressor, PH, and SOFA score had most effects.
      Previous studies have suggested that prolonged MV is significantly associated with ICU mortality risk,
      • Chelluri L
      • Im KA
      • Belle SH
      • Schulz R
      • Rotondi AJ
      • Donahoe MP
      • Sirio CA
      • Mendelsohn AB
      • Pinsky MR.
      Long-term mortality and quality of life after prolonged mechanical ventilation.
      • Cox CE
      • Carson SS
      • Lindquist JH
      • Olsen MK
      • Govert JA
      • Chelluri L
      • et al.
      Differences in one-year health outcomes and resource utilization by definition of prolonged mechanical ventilation: a prospective cohort study.
      • Pranikoff T
      • Hirschl RB
      • Steimle CN
      • Anderson 3rd, HL
      • Bartlett RH.
      Mortality is directly related to the duration of mechanical ventilation before the initiation of extracorporeal life support for severe respiratory failure.
      ICU readmission risk,
      • Zilberberg MD
      • Luippold RS
      • Sulsky S
      • Shorr AF.
      Prolonged acute mechanical ventilation, hospital resource utilization, and mortality in the United States.
      high ICU hospitalization costs,
      • Bice T
      • Cox CE
      • Carson SS.
      Cost and health care utilization in ARDS–different from other critical illness?.
      ,

      Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 2017 May 22; Available from: http://arxiv.org/abs/1705.07874

      ,
      • Dasta JF
      • McLaughlin TP
      • Mody SH
      • Piech CT.
      Daily cost of an intensive care unit day: the contribution of mechanical ventilation.
      and decreased long-term quality of life.
      • Chelluri L
      • Im KA
      • Belle SH
      • Schulz R
      • Rotondi AJ
      • Donahoe MP
      • Sirio CA
      • Mendelsohn AB
      • Pinsky MR.
      Long-term mortality and quality of life after prolonged mechanical ventilation.
      Accurate MV duration predictions can therefore allow better risk stratification of patients, assist clinical decision-making, and optimize ICU resource allocation, which is of great significance for improving both cost-effectiveness and patient outcomes. Although there has been considerable research and prediction models on prolonged MV duration,
      • Parreco J
      • Hidalgo A
      • Parks JJ
      • Kozol R
      • Rattan R.
      Using artificial intelligence to predict prolonged mechanical ventilation and tracheostomy placement.
      • Hessels L
      • Coulson TG
      • Seevanayagam S
      • Young P
      • Pilcher D
      • Marhoon N
      • et al.
      Development and validation of a score to identify cardiac surgery patients at high risk of prolonged mechanical ventilation.
      • Magoon R.
      RAISE"ing a Score to Predict Prolonged Mechanical Ventilation Following Subarachnoid Hemorrhage.
      • Clark PA
      • Inocencio RC
      • Lettieri CJ.
      I-TRACH: validating a tool for predicting prolonged mechanical ventilation.
      • Dallazen-Sartori F.
      • Albuquerque L.C.
      • Guaragna J.C.V.C.
      • Magedanz E.H.
      • Petracco J.B.
      • Bodanese R.
      • Wagner M.B.
      • Bodanese L.C.
      Risk Score for Prolonged Mechanical Ventilation in Coronary Artery Bypass Grafting.
      • Figueroa-Casas JB
      • Dwivedi AK
      • Connery SM
      • Quansah R
      • Ellerbrook L
      • Galvis J.
      Predictive models of prolonged mechanical ventilation yield moderate accuracy.
      since the definition of prolonged MV was not consistent, the performance evaluation of related prediction models is not applicable to all situations.
      • Rose L
      • McGinlay M
      • Amin R
      • Burns KE
      • Connolly B
      • Hart N
      • et al.
      Variation in definition of prolonged mechanical ventilation.
      The study by Rose et al. showed that in the past hundreds of studies, there were more than 30 definitions of extended MV alone, and the time span ranged from 72 hours to 3 months, which greatly weakened the generality of the model. In addition, few previous studies have investigated predictions of specific MV duration, and such predictions based on the clinical experience of intensivists are unsatisfactory.
      • Carpenè N
      • Vagheggini G
      • Panait E
      • Gabbrielli L
      • Ambrosino N.
      A proposal of a new model for long-term weaning: respiratory intensive care unit and weaning center.
      New prediction tools must therefore be developed. Recently, Sayed et al develop a MV prediction model of MV duration based on MIMIC-III database.
      • Sayed M
      • Riaño D
      • Villar J.
      Predicting Duration of Mechanical Ventilation in Acute Respiratory Distress Syndrome Using Supervised Machine Learning.
      However, MIMIC-III only includes patients admitted to ICUs from 2001 to 2011, which may be outdated data that do not reflect current patient situations there was no model explanation in precious MV based model which may weaken their capacity promoting to the clinics.
      We believe that our study had strengths. First, we used the MIMIC-IV database to train models. In addition to fixing the errors of the MIMIC-III, the MIMIC-IV includes patients from 2008 to 2019, which can better represent the actual current situations of patients with ARDS. Second, based on our understanding, our model showed best prediction performance compared to previous studies. Third, we used two external datasets from United State and Netherlands to test our model. Finally, we also conducted three methods for model interpretation with consistent result which is easier intensivists to understand. Of course, our study also had limitations. We only confirmed 29 ARDS patients from AmsterdamUMCdb which weaken the power of the testing result. In addition, due to the limitation of database, we didn't include the status of comorbidities which may decrease the prediction performance. Finally, we only extracted features observed at the initiation of MV without compared the prediction performance of featured extracted after MV start. Therefore, more future research is necessary.

      Conclusion

      ML models with features at MV initiation can accurate predict MV duration in patients with ARDS in ICUs. Among seven algorithms, XGB models showed the best performance (RMSE= 5.57 and 5.46 in two external datasets). LIME, SHAP, and Breakdown methods showed good performance as AXI methods.

      Ethics approval and consent to participate

      All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Data extracted from the MIMIC-IV, eICU-CRD and AmsterdamUMCdb database do not require individual informed consent because realted research data is publicly available and all patient data are de-identified according to Health Insurance Portability and Accountability Act and European General Data Protection Regulation.

      Consent for publication

      Not applicable.

      Funding

      Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization (2021B1212040007).

      Availability of data and material

      The data were available on the MIMIC-IV website at ‘https://www.physionet.org/content/mimiciv/2.0/’. eICU-CRD website at https://eicu-crd.mit.edu/, and AmsterdamUMCdb Github website at ‘https://github.com/AmsterdamUMC/AmsterdamUMCdb’. The data in this article can be reasonably applied to the corresponding author.

      Author contributions

      ZW did the conceptualization, methodology and the wiriting-original draft. LZ did the conceptualization and supervision. TH finished the data curation. RY and HC did formal analysis and software processing. HW assisted the amethodbology and visualization. JL and HY assisted with resources, conceptualization, and supervision. All authors read and approved wiritiing-original draft and wiriting-reivew&editing.

      Financial Disclosure statement

      The study was supported by Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization (2021B1212040007).

      Conflicts of interest

      The authors report no conflicts of interest in this work.

      Acknowledgments

      Not applicable.

      Appendix. Supplementary materials

      References

      1. Rubenfeld GD, Caldwell E, Peabody E, Weaver J, Martin DP, Neff M, et al. Incidence and Outcomes of Acute Lung Injury From the Division of Pulmonary and Criti-cal Care Medicine (G [Internet]. Vol. 16, n engl j med. 2005. Available from: www.nejm.org

        • Nieman GF
        • Andrews P
        • Satalin J
        • Wilcox K
        • Kollisch-Singule M
        • Madden M
        • et al.
        Acute lung injury: how to stabilize a broken lung.
        Crit Care. 2018; 22 ([Internet]May 24Available from): 136
        • Bellani G
        • Laffey JG
        • Pham T
        • Fan E
        • Brochard L
        • Esteban A
        • et al.
        Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries.
        JAMA. 2016; 315 (Feb 23): 788-800
        • Bein T
        • Grasso S
        • Moerer O
        • Quintel M
        • Guerin C
        • Deja M
        • et al.
        The standard of care of patients with ARDS: ventilatory settings and rescue therapies for refractory hypoxemia.
        Intensive Care Med. 2016; 42 ([Internet]2016/04/04MayAvailable from): 699-711
      2. Papazian L, Aubron C, Brochard L, Chiche JD, Combes A, Dreyfuss D, Forel JM, Guérin C, Jaber S, Mekontso-Dessap A, Mercat A, Richard JC, Roux D, Vieillard-Baron A, Faure H. Formal guidelines: management of acute respiratory distress syndrome. Ann Intensive Care. 2019 Jun 13;9(1):69. https://doi.org/10.1186/s13613-019-0540-9. PMID: 31197492; PMCID: PMC6565761.

        • Ayzac L
        • Girard R
        • Baboi L
        • Beuret P
        • Rabilloud M
        • Richard JC
        • et al.
        Ventilator-associated pneumonia in ARDS patients: the impact of prone positioning. A secondary analysis of the PROSEVA trial.
        Intensive Care Med. 2016; 42 (May 1): 871-878
        • Bice T
        • Cox CE
        • Carson SS.
        Cost and health care utilization in ARDS–different from other critical illness?.
        Semin Respir Crit Care Med. 2013; 34 ([Internet]2013/08/11AugAvailable from): 529-536
        • Curley GF
        • Laffey JG
        • Zhang H
        • Slutsky AS.
        Biotrauma and ventilator-induced lung injury: clinical implications.
        Chest. 2016; 150 (Nov 1): 1109-1117
      3. Terragni PP, Antonelli M, Fumagalli R, Faggiano C, Berardino M, Pallavicini FB, et al. Early vs late tracheotomy for prevention of Pneumonia in mechanically ventilated adult ICU patients a randomized controlled trial [Internet]. Available from: https://jamanetwork.com/

        • Kreymann KG
        • Berger MM
        • Deutz NEP
        • Hiesmayr M
        • Jolliet P
        • Kazandjiev G
        • et al.
        ESPEN guidelines on enteral nutrition: intensive care.
        Clin Nutr. 2006; 25 (Apr): 210-223
        • van den Berghe G
        • Wilmer A
        • Milants I
        • Wouters PJ
        • Bouckaert B
        • Bruyninckx F
        • et al.
        Intensive insulin therapy in mixed medical/surgical intensive care units: benefit versus harm.
        Diabetes. 2006; 55 (Nov): 3151-3159
        • Carpenè N
        • Vagheggini G
        • Panait E
        • Gabbrielli L
        • Ambrosino N.
        A proposal of a new model for long-term weaning: respiratory intensive care unit and weaning center.
        Respir Med. 2010; 104 (Oct): 1505-1511
        • Figueroa-Casas JB
        • Connery SM
        • Montoya R
        • Dwivedi AK
        • Lee S.
        Accuracy of early prediction of duration of mechanical ventilation by intensivists.
        Ann Am Thorac Soc. 2014; 11: 182-185
        • Gutierrez G.
        Artificial intelligence in the intensive care unit.
        Crit Care. 2020; 24 ([Internet]Mar 24Available from): 101
        • Wu WT
        • Li YJ
        • Feng AZ
        • Li L
        • Huang T
        • Xu AD
        • Lyu J.
        Data mining in clinical big data: the frequently used databases, steps, and methodological models.
        Mil Med Res. 2021 Aug 11; 8 (PMID: 34380547; PMCID: PMC8356424): 44https://doi.org/10.1186/s40779-021-00338-z
        • Yang J
        • Li Y
        • Liu Q
        • Li L
        • Feng A
        • Wang T
        • et al.
        Brief introduction of medical database and data mining technology in big data era.
        J Evid Based Med. 2020; 13: 57-69
      4. Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. entific Data.

        • Pollard TJ
        • Johnson AEW
        • Raffa JD
        • Celi LA
        • Mark RG
        • Badawi O.
        The eICU collaborative research database, a freely available multi-center database for critical care research.
        Sci Data. 2018; 5 ([Internet]Available from)180178https://doi.org/10.1038/sdata.2018.178
        • Thoral PJ
        • Peppink JM
        • Driessen RH
        • Sijbrands EJG
        • Kompanje EJO
        • Kaplan L
        • et al.
        Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: the Amsterdam university medical centers database (AmsterdamUMCdb) example∗.
        Crit Care Med. 2021; 49 (Jun 1): E563-E577
        • Force* TADT
        Acute respiratory distress syndrome: the Berlin definition.
        JAMA. 2012; 307 ([Internet]Jun 20Available from): 2526-2533https://doi.org/10.1001/jama.2012.5669
        • Leisman DE
        • Harhay MO
        • Lederer DJ
        • Abramson M
        • Adjei AA
        • Bakker J
        • Ballas ZK
        • Barreiro E
        • Bell SC
        • Bellomo R
        • Bernstein JA
        • Branson RD
        • Brusasco V
        • Chalmers JD
        • Chokroverty S
        • Citerio G
        • Collop NA
        • Cooke CR
        • Crapo JD
        • Donaldson G
        • Fitzgerald DA
        • Grainger E
        • Hale L
        • Herth FJ
        • Kochanek PM
        • Marks G
        • Moorman JR
        • Ost DE
        • Schatz M
        • Sheikh A
        • Smyth AR
        • Stewart I
        • Stewart PW
        • Swenson ER
        • Szymusiak R
        • Teboul JL
        • Vincent JL
        • Wedzicha JA
        • Maslove DM.
        Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals.
        Crit Care Med. 2020 May; 48 (PMID: 32141923; PMCID: PMC7161722): 623-633https://doi.org/10.1097/CCM.0000000000004246
        • The Lancet Respiratory Medicine
        Opening the black box of machine learning.
        Lancet Respir Med. 2018 Nov; 6 (Epub 2018 Oct 18. PMID: 30343029): 801https://doi.org/10.1016/S2213-2600(18)30425-9
        • Ribeiro MT
        • Singh S
        • Guestrin C.
        Why should I trust you?” Explaining the predictions of any classifier.
        in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016: 1135-1144
      5. Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 2017 May 22; Available from: http://arxiv.org/abs/1705.07874

        • Biecek P.law.
        DALEX: Explainers for Complex Predictive Models in R.
        Journal of Machine Learning Research. 2018; 19: 1-5
        • Chelluri L
        • Im KA
        • Belle SH
        • Schulz R
        • Rotondi AJ
        • Donahoe MP
        • Sirio CA
        • Mendelsohn AB
        • Pinsky MR.
        Long-term mortality and quality of life after prolonged mechanical ventilation.
        Crit Care Med. 2004 Jan; 32 (PMID: 14707560): 61-69https://doi.org/10.1097/01.CCM.0000098029.65347.F9
        • Cox CE
        • Carson SS
        • Lindquist JH
        • Olsen MK
        • Govert JA
        • Chelluri L
        • et al.
        Differences in one-year health outcomes and resource utilization by definition of prolonged mechanical ventilation: a prospective cohort study.
        Crit Care. 2007; 11 ([Internet]Available from): R9
        • Pranikoff T
        • Hirschl RB
        • Steimle CN
        • Anderson 3rd, HL
        • Bartlett RH.
        Mortality is directly related to the duration of mechanical ventilation before the initiation of extracorporeal life support for severe respiratory failure.
        Crit Care Med. 1997 Jan; 25 (PMID: 8989172): 28-32https://doi.org/10.1097/00003246-199701000-00008
        • Zilberberg MD
        • Luippold RS
        • Sulsky S
        • Shorr AF.
        Prolonged acute mechanical ventilation, hospital resource utilization, and mortality in the United States.
        Crit Care Med. 2008 Mar; 36 (PMID: 18209667): 724-730https://doi.org/10.1097/CCM.0B013E31816536F7
        • Dasta JF
        • McLaughlin TP
        • Mody SH
        • Piech CT.
        Daily cost of an intensive care unit day: the contribution of mechanical ventilation.
        Crit Care Med. 2005 Jun; 33 (PMID: 15942342): 1266-1271https://doi.org/10.1097/01.ccm.0000164543.14619.00
        • Parreco J
        • Hidalgo A
        • Parks JJ
        • Kozol R
        • Rattan R.
        Using artificial intelligence to predict prolonged mechanical ventilation and tracheostomy placement.
        J Surg Res. 2018; 228 ([Internet]Available from): 179-187
        • Hessels L
        • Coulson TG
        • Seevanayagam S
        • Young P
        • Pilcher D
        • Marhoon N
        • et al.
        Development and validation of a score to identify cardiac surgery patients at high risk of prolonged mechanical ventilation.
        J Cardiothorac Vasc Anesth. 2019; 33 ([Internet]Available from): 2709-2716
        • Magoon R.
        RAISE"ing a Score to Predict Prolonged Mechanical Ventilation Following Subarachnoid Hemorrhage.
        Crit Care Med. 2022 Jul 1; 50 (Epub 2022 Jun 13. PMID: 35726992): e655-e656ehttps://doi.org/10.1097/CCM.0000000000005507
        • Clark PA
        • Inocencio RC
        • Lettieri CJ.
        I-TRACH: validating a tool for predicting prolonged mechanical ventilation.
        J Intensive Care Med. 2016; 33 ([Internet]Nov 30Available from): 567-573https://doi.org/10.1177/0885066616679974
        • Dallazen-Sartori F.
        • Albuquerque L.C.
        • Guaragna J.C.V.C.
        • Magedanz E.H.
        • Petracco J.B.
        • Bodanese R.
        • Wagner M.B.
        • Bodanese L.C.
        Risk Score for Prolonged Mechanical Ventilation in Coronary Artery Bypass Grafting.
        Int J Cardiovasc Sci. 2020; 34: 264-271
        • Figueroa-Casas JB
        • Dwivedi AK
        • Connery SM
        • Quansah R
        • Ellerbrook L
        • Galvis J.
        Predictive models of prolonged mechanical ventilation yield moderate accuracy.
        J Crit Care. 2015; 30 ([Internet]Available from): 502-505
        • Rose L
        • McGinlay M
        • Amin R
        • Burns KE
        • Connolly B
        • Hart N
        • et al.
        Variation in definition of prolonged mechanical ventilation.
        Respir Care. 2017; 62 (Oct 1): 1324-1332
        • Sayed M
        • Riaño D
        • Villar J.
        Predicting Duration of Mechanical Ventilation in Acute Respiratory Distress Syndrome Using Supervised Machine Learning.
        J Clin Med. 2021; 10 (Published 2021 Aug 26): 3824https://doi.org/10.3390/jcm10173824