TY - JOUR
T1 - Robust Mutation Profiling of SARS‐CoV‐2 Variants from Multiple Raw Illumina Sequencing Data with Cloud Workflow
AU - Lim, Hendrick Gao Min
AU - Hsiao, Shih Hsin
AU - Fann, Yang C.
AU - Lee, Yuan Chii Gladys
N1 - Funding Information:
This work was supported in part by the Ministry of Science and Technology (MOST) (grant MOST 109?2221?E?038?016) of the Taiwanese government to Y.?C.G.L., the Taipei Medical University Hospital (W0303, 109TMUH?SP?02) to S.?H.H., and the Fund from the Division of Intramural Research Program, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA to Y.C.F. The Seven Bridges Cancer Research Data Commons Cloud Resource has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health (Contract No. HHSN261201400008C and ID/IQ Agreement No. 17?146 under Contract No. HHSN261201500003I and 75N91019D00024).
Funding Information:
Funding: This work was supported in part by the Ministry of Science and Technology (MOST) (grant MOST 109‐2221‐E‐038‐016) of the Taiwanese government to Y.‐C.G.L., the Taipei Medical University Hospital (W0303, 109TMUH‐SP‐02) to S.‐H.H., and the Fund from the Division of Intra‐ mural Research Program, National Institute of Neurological Disorders and Stroke, National Insti‐ tutes of Health, USA to Y.C.F. The Seven Bridges Cancer Research Data Commons Cloud Resource has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health (Contract No. HHSN261201400008C and ID/IQ Agreement No. 17×146 under Contract No. HHSN261201500003I and 75N91019D00024).
Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/4
Y1 - 2022/4
N2 - Several variants of the novel severe acute respiratory syndrome coronavirus 2 (SARS‐ CoV‐2) are emerging all over the world. Variant surveillance from genome sequencing has become crucial to determine if mutations in these variants are rendering the virus more infectious, potent, or resistant to existing vaccines and therapeutics. Meanwhile, analyzing many raw sequencing data repeatedly with currently available code‐based bioinformatics tools is tremendously challenging to be implemented in this unprecedented pandemic time due to the fact of limited experts and computational resources. Therefore, in order to hasten variant surveillance efforts, we developed an installation‐free cloud workflow for robust mutation profiling of SARS‐CoV‐2 variants from multiple Illumina sequencing data. Herein, 55 raw sequencing data representing four early SARS‐CoV‐2 variants of concern (Alpha, Beta, Gamma, and Delta) from an open‐access database were used to test our workflow performance. As a result, our workflow could automatically identify mutated sites of the variants along with reliable annotation of the protein‐coding genes at cost‐effective and timely manner for all by harnessing parallel cloud computing in one execution under resource‐limitation settings. In addition, our workflow can also generate a consensus genome sequence which can be shared with others in public data repositories to support global variant surveillance efforts.
AB - Several variants of the novel severe acute respiratory syndrome coronavirus 2 (SARS‐ CoV‐2) are emerging all over the world. Variant surveillance from genome sequencing has become crucial to determine if mutations in these variants are rendering the virus more infectious, potent, or resistant to existing vaccines and therapeutics. Meanwhile, analyzing many raw sequencing data repeatedly with currently available code‐based bioinformatics tools is tremendously challenging to be implemented in this unprecedented pandemic time due to the fact of limited experts and computational resources. Therefore, in order to hasten variant surveillance efforts, we developed an installation‐free cloud workflow for robust mutation profiling of SARS‐CoV‐2 variants from multiple Illumina sequencing data. Herein, 55 raw sequencing data representing four early SARS‐CoV‐2 variants of concern (Alpha, Beta, Gamma, and Delta) from an open‐access database were used to test our workflow performance. As a result, our workflow could automatically identify mutated sites of the variants along with reliable annotation of the protein‐coding genes at cost‐effective and timely manner for all by harnessing parallel cloud computing in one execution under resource‐limitation settings. In addition, our workflow can also generate a consensus genome sequence which can be shared with others in public data repositories to support global variant surveillance efforts.
KW - cloud workflow
KW - Common Workflow Language
KW - COVID‐19
KW - genomics surveillance
KW - Illumina sequencing
KW - lineage
KW - mutation
KW - parallel computing
KW - SARS‐CoV‐2
KW - variant
UR - http://www.scopus.com/inward/record.url?scp=85128769262&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128769262&partnerID=8YFLogxK
U2 - 10.3390/genes13040686
DO - 10.3390/genes13040686
M3 - Article
C2 - 35456492
AN - SCOPUS:85128769262
SN - 2073-4425
VL - 13
JO - Genes
JF - Genes
IS - 4
M1 - 686
ER -