TY - JOUR
T1 - Orchestrating an optimized next-generation sequencing-based cloud workflow for robust viral identification during pandemics
AU - Lim, Hendrick Gao Min
AU - Hsiao, Shih Hsin
AU - Lee, Yuan Chii Gladys
N1 - Funding Information:
This work was supported in part by the Ministry of Science and Technology (MOST) grant (MOST 109-2221-E-038-016) of the Taiwanese government to Y.-C.G.L., Taipei Medical University Hospital (W0303, 109TMUH-SP-02) to S.-H.H., and The Seven Bridges Cancer Research Data Com-mons Cloud Resource was funded, in whole or in part, with Federal funds from the National Cancer Institute, National Institutes of Health, contract no. HHSN261201400008C and ID/IQ agreement no. 17 ? 146 under contract no. HHSN261201500003I and 75N91019D00024.The authors wish to thank the Seven Bridges Genomics bioinformatics support team, especially Vesna Pajic, Ruzica Gagic, Milica Aleksic, Aleksandar Danicic, and Devin Leung for assistance during the workflow execution and deployment. This manuscript was edited by Wallace Academic Editing.
Funding Information:
Funding: This work was supported in part by the Ministry of Science and Technology (MOST) grant (MOST 109-2221-E-038-016) of the Taiwanese government to Y.-C.G.L., Taipei Medical University Hospital (W0303, 109TMUH-SP-02) to S.-H.H., and The Seven Bridges Cancer Research Data Commons Cloud Resource was funded, in whole or in part, with Federal funds from the National Cancer Institute, National Institutes of Health, contract no. HHSN261201400008C and ID/IQ agreement no. 17 × 146 under contract no. HHSN261201500003I and 75N91019D00024.
Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/10
Y1 - 2021/10
N2 - Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.
AB - Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.
KW - Cloud computing
KW - Cloud workflow
KW - COVID-19
KW - H1N1
KW - Next-generation sequencing
KW - Pandemics
KW - SARS-CoV-2
KW - Swine flu
UR - http://www.scopus.com/inward/record.url?scp=85117498282&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85117498282&partnerID=8YFLogxK
U2 - 10.3390/biology10101023
DO - 10.3390/biology10101023
M3 - Article
AN - SCOPUS:85117498282
SN - 2079-7737
VL - 10
JO - Biology
JF - Biology
IS - 10
M1 - 1023
ER -