Comprehensive Mining and Analyzing the Effects of Microbes on Disease Development through Unsupervised Microbial Genome Clustering Tools(3/3)

研究計畫: A - 政府部門b - 國家科學及技術委員會

專案詳細資料

說明

It is already a recognized fact that microbes living inside our bodies can affect our physical and phycological wellbeings. The identification of key microbes, however, is still a sophisticated process. In this report I introduce our efforts in designing computational workflows in distinguishing important microbes that are significantly associated with one of the diseases, the colorectal cancer (CRC), which remains to be the top three cancerous diseases occur worldwide and in Taiwan. In the first attempt I designed a machine learning feature selection workflow to find important species for CRC. By using metagenomic binning approach getting high quality genomes from two CRC cohorts, in which one is an eastern cohort while the other composed of western populations, and applied specificallydesigned feature selection workflow on the extracted genomes, I successfully identified crucial species from the two cohorts. The analysis of the extracted genomes reveals that the crucial genomes in the eastern and western cohorts are very different, indicating distinct diet and
environmental factors affecting the composition of the gutmicrobial composition. The finding of several novel species from the two cohorts may also increase our understanding of the CRC microbes.
In addition to the conventional machine learning feature selection approach, which may extract too many features such that one cannot identify the most crucial features from the plethora of selected features, I also designed a
novel feature selection algorithm, the Cross-Validated Feature Selection (CVFS). By self-scrutinizing features among different sample subsets, I successfully selected a handful of important features from the CRC cohorts. The improved workflow not only allows the finding of the most indispensable microbes but also opens the door for further downstream analysis, showcasing the importance of good feature selection algorithms in finding microbial markers
from the microbial populations
狀態已完成
有效的開始/結束日期8/1/237/31/24

Keywords

  • 微生物相
  • 微生物體
  • 疾病分析
  • 非監督式分群
  • 基因體分析

指紋

探索此專案觸及的研究主題。這些標籤是根據基礎獎勵/補助款而產生。共同形成了獨特的指紋。