TY - JOUR
T1 - Machine learning-based monosaccharide profiling for tissue-specific classification of Wolfiporia extensa samples
AU - Hsiung, Shih-Yi
AU - Deng, Shun-Xin
AU - Li, Jing
AU - Huang, Sheng-Yao
AU - Liaw, Chen-Kun
AU - Huang, Su-Yun
AU - Wang, Ching-Chiung
AU - Hsieh, Yves S.Y.
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2023/11/15
Y1 - 2023/11/15
N2 - Machine learning (ML) has been used for many clinical decision-making processes and diagnostic procedures in bioinformatics applications. We examined eight algorithms, including linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), Naïve Bayes classifier (NB), and artificial neural network (ANN) models, to evaluate their classification and prediction capabilities for four tissue types in Wolfiporia extensa using their monosaccharide composition profiles. All 8 ML-based models were assessed as exemplary models with AUC exceeding 0.8. Five models, namely LDA, KNN, RF, GBM, and ANN, performed excellently in the four-tissue-type classification (AUC > 0.9). Additionally, all eight models were evaluated as good predictive models with AUC value >0.8 in the three-tissue-type classification. Notably, all 8 ML-based methods outperformed the single linear discriminant analysis (LDA) plotting method. For large sample sizes, the ML-based methods perform better than traditional regression techniques and could potentially increase the accuracy in identifying tissue samples of W. extensa.
AB - Machine learning (ML) has been used for many clinical decision-making processes and diagnostic procedures in bioinformatics applications. We examined eight algorithms, including linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), Naïve Bayes classifier (NB), and artificial neural network (ANN) models, to evaluate their classification and prediction capabilities for four tissue types in Wolfiporia extensa using their monosaccharide composition profiles. All 8 ML-based models were assessed as exemplary models with AUC exceeding 0.8. Five models, namely LDA, KNN, RF, GBM, and ANN, performed excellently in the four-tissue-type classification (AUC > 0.9). Additionally, all eight models were evaluated as good predictive models with AUC value >0.8 in the three-tissue-type classification. Notably, all 8 ML-based methods outperformed the single linear discriminant analysis (LDA) plotting method. For large sample sizes, the ML-based methods perform better than traditional regression techniques and could potentially increase the accuracy in identifying tissue samples of W. extensa.
KW - Linear discriminant analysis
KW - Machine learning
KW - Predictive model
KW - Tissue-specific classification
KW - Wolfiporia extensa
UR - http://www.scopus.com/inward/record.url?scp=85170431353&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85170431353&partnerID=8YFLogxK
U2 - 10.1016/j.carbpol.2023.121338
DO - 10.1016/j.carbpol.2023.121338
M3 - Article
SN - 0144-8617
VL - 322
JO - Carbohydrate Polymers
JF - Carbohydrate Polymers
M1 - 121338
ER -