Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics

Yu Wei Wu, Mina Rho, Thomas G. Doak, Yuzhen Ye

研究成果: 雜誌貢獻文章同行評審

11 引文 斯高帕斯(Scopus)

摘要

Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assembler. This further complicates annotation of metagenomic datasets, as annotation tools (such as gene predictors or similarity search tools) typically perform poorly on configs encoding short gene fragments. Results: We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive 'gene paths' in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic sequences, by connecting contigs with the guidance of homologous genes-information that is orthogonal to the sequencing reads. We note that the improvement of gene assembly can be observed even when only distantly related genes are available as the reference. We further propose to use 'gene graphs' to represent the assembly of reads from homologous genes and discuss potential applications of gene graphs to improving functional annotation for metagenomics.
原文英語
文章編號bts388
期刊Bioinformatics
28
發行號18
DOIs
出版狀態已發佈 - 9月 2012
對外發佈

ASJC Scopus subject areas

  • 生物化學
  • 分子生物學
  • 計算機理論與數學
  • 電腦科學應用
  • 計算數學
  • 統計與概率
  • 醫藥 (全部)

指紋

深入研究「Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics」主題。共同形成了獨特的指紋。

引用此