Towards a reference genome that captures global genetic diversity

Karen H Y Wong, Walfred Ma, Chun-Yu Wei, Erh-Chan Yeh, Wan-Jia Lin, Elin H F Wang, Jen-Ping Su, Feng-Jen Hsieh, Hsiao-Jung Kao, Hsiao-Huei Chen, Stephen K Chow, Eleanor Young, Catherine Chu, Annie Poon, Chi-Fan Yang, Dar-Shong Lin, Yu-Feng Hu, Jer-Yuarn Wu, Ni-Chung Lee, Wuh-Liang HwuDario Boffelli, David Martin, Ming Xiao, Pui-Yan Kwok

研究成果: 雜誌貢獻文章同行評審

23 引文 斯高帕斯(Scopus)

摘要

The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. Here, we analyze 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution. We identify 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupt exons and known regulatory elements. To improve genome annotations, we linearly integrate these sequences into the chromosomal assemblies and construct a Human Diversity Reference. Leveraging this reference, an average of 402,573 previously unmapped reads can be recovered for a given genome sequenced to ~40X coverage. Transcriptomic diversity among these non-reference sequences can also be directly assessed. We successfully map tens of thousands of previously discarded RNA-Seq reads to this reference and identify transcription evidence in 4781 gene loci, underlining the importance of these non-reference sequences in functional genomics. Our extensive datasets are important advances toward a comprehensive reference representation of global human genetic diversity.
原文英語
頁(從 - 到)5482
期刊Nature Communications
11
發行號1
DOIs
出版狀態已發佈 - 10月 30 2020
對外發佈

指紋

深入研究「Towards a reference genome that captures global genetic diversity」主題。共同形成了獨特的指紋。

引用此