Project Details
Description
It is already a recognized fact that microbes living inside our bodies can affect our physical and phycological wellbeings. The identification of key microbes, however, is still a sophisticated process. In this report I introduce our efforts in designing computational workflows in distinguishing important microbes that are significantly associated with one of the diseases, the colorectal cancer (CRC), which remains to be the top three cancerous diseases occur worldwide and in Taiwan. In the first attempt I designed a machine learning feature selection workflow to find important species for CRC. By using metagenomic binning approach getting high quality genomes from two CRC cohorts, in which one is an eastern cohort while the other composed of western populations, and applied specificallydesigned feature selection workflow on the extracted genomes, I successfully identified crucial species from the two cohorts. The analysis of the extracted genomes reveals that the crucial genomes in the eastern and western cohorts are very different, indicating distinct diet and
environmental factors affecting the composition of the gutmicrobial composition. The finding of several novel species from the two cohorts may also increase our understanding of the CRC microbes.
In addition to the conventional machine learning feature selection approach, which may extract too many features such that one cannot identify the most crucial features from the plethora of selected features, I also designed a
novel feature selection algorithm, the Cross-Validated Feature Selection (CVFS). By self-scrutinizing features among different sample subsets, I successfully selected a handful of important features from the CRC cohorts. The improved workflow not only allows the finding of the most indispensable microbes but also opens the door for further downstream analysis, showcasing the importance of good feature selection algorithms in finding microbial markers
from the microbial populations
environmental factors affecting the composition of the gutmicrobial composition. The finding of several novel species from the two cohorts may also increase our understanding of the CRC microbes.
In addition to the conventional machine learning feature selection approach, which may extract too many features such that one cannot identify the most crucial features from the plethora of selected features, I also designed a
novel feature selection algorithm, the Cross-Validated Feature Selection (CVFS). By self-scrutinizing features among different sample subsets, I successfully selected a handful of important features from the CRC cohorts. The improved workflow not only allows the finding of the most indispensable microbes but also opens the door for further downstream analysis, showcasing the importance of good feature selection algorithms in finding microbial markers
from the microbial populations
Status | Finished |
---|---|
Effective start/end date | 8/1/23 → 7/31/24 |
Keywords
- Microbiome
- Metagenomics
- Disease analysis
- Unsupervised clustering
- Genome analysis
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.