TY - JOUR
T1 - Inequality relations for NMR-based polymer homoblock analysis and extended application: Reanalysis of historical data on alginates, chitosans, homogalacturonans, and galactomannans
AU - Xing, Xiaohui
AU - Xing, Kanglin
AU - Hsieh, Yves S.Y.
AU - Abbott, D. Wade
N1 - Publisher Copyright:
© 2024
PY - 2024/8
Y1 - 2024/8
N2 - There has been a long-standing bottleneck in the quantitative analysis of the frequencies of homoblock polyads beyond triads using 1H and 13C NMR for linear polysaccharides, primarily because monosaccharides within a long homoblock share similar chemical environments due to identical neighboring units, resulting in indistinct NMR peaks. In this study, through rigorous mathematical induction, inequality relations were established that enabled the calculation of frequency ranges of homoblock polyads from historically reported NMR-derived frequency values of diads and/or triads of alginates, chitosans, homogalacturonans, and galactomannans. The calculated homoblock frequency ranges were then applied to evaluate three chain growth statistical models, including the Bernoulli chain, first-order Markov chain, and second-order Markov chain, for predicting homoblock frequencies in these polysaccharides. Furthermore, based on the mathematically derived inequality relations, a novel 2D array was constructed, enabling the graphical visualization of homoblock features in polysaccharides. It was demonstrated, as a proof of concept, that the novel 2D array, along with a 1D code generated from it, could serve as an effective feature engineering tool for polymer classification using machine learning algorithms.
AB - There has been a long-standing bottleneck in the quantitative analysis of the frequencies of homoblock polyads beyond triads using 1H and 13C NMR for linear polysaccharides, primarily because monosaccharides within a long homoblock share similar chemical environments due to identical neighboring units, resulting in indistinct NMR peaks. In this study, through rigorous mathematical induction, inequality relations were established that enabled the calculation of frequency ranges of homoblock polyads from historically reported NMR-derived frequency values of diads and/or triads of alginates, chitosans, homogalacturonans, and galactomannans. The calculated homoblock frequency ranges were then applied to evaluate three chain growth statistical models, including the Bernoulli chain, first-order Markov chain, and second-order Markov chain, for predicting homoblock frequencies in these polysaccharides. Furthermore, based on the mathematically derived inequality relations, a novel 2D array was constructed, enabling the graphical visualization of homoblock features in polysaccharides. It was demonstrated, as a proof of concept, that the novel 2D array, along with a 1D code generated from it, could serve as an effective feature engineering tool for polymer classification using machine learning algorithms.
KW - Feature engineering
KW - Homoblock
KW - Machine learning
KW - Markov chain
KW - NMR
KW - Polysaccharide
UR - http://www.scopus.com/inward/record.url?scp=85197492517&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197492517&partnerID=8YFLogxK
U2 - 10.1016/j.carres.2024.109189
DO - 10.1016/j.carres.2024.109189
M3 - Article
SN - 0008-6215
VL - 542
JO - Carbohydrate Research
JF - Carbohydrate Research
M1 - 109189
ER -