TY - JOUR
T1 - Semiparametric prognosis models in genomic studies
AU - Ma, Shuangge
AU - Huang, Jian
AU - Shi, Mingyu
AU - Li, Yang
AU - Shia, Ben Chang
PY - 2010/2/1
Y1 - 2010/2/1
N2 - Development of high-throughput technologies makes it possible to survey the whole genome. Genomic studies have been extensively conducted, searching for markers with predictive power for prognosis of complex diseases such as cancer, diabetes and obesity. Most existing statistical analyses are focused on developing marker selection techniques, while little attention is paid to the underlying prognosis models. In this article, we review three commonly used prognosis models, namely the Cox, additive risk and accelerated failure time models. We conduct simulation and show that gene identification can be unsatisfactory under model misspecification.We analyze three cancer prog-nosis studies under the three models, and show that the gene identification results, prediction performance of all identified genes combined, and reproducibility of each identified gene are model-dependent. We suggest that in practical data analysis, more attention should be paid to the model assumption, and multiple models may need to be considered.
AB - Development of high-throughput technologies makes it possible to survey the whole genome. Genomic studies have been extensively conducted, searching for markers with predictive power for prognosis of complex diseases such as cancer, diabetes and obesity. Most existing statistical analyses are focused on developing marker selection techniques, while little attention is paid to the underlying prognosis models. In this article, we review three commonly used prognosis models, namely the Cox, additive risk and accelerated failure time models. We conduct simulation and show that gene identification can be unsatisfactory under model misspecification.We analyze three cancer prog-nosis studies under the three models, and show that the gene identification results, prediction performance of all identified genes combined, and reproducibility of each identified gene are model-dependent. We suggest that in practical data analysis, more attention should be paid to the model assumption, and multiple models may need to be considered.
KW - Genomic studies
KW - Model comparison
KW - Semiparametric prognosis models
UR - http://www.scopus.com/inward/record.url?scp=77955026459&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955026459&partnerID=8YFLogxK
U2 - 10.1093/bib/bbp070
DO - 10.1093/bib/bbp070
M3 - Article
C2 - 20123942
AN - SCOPUS:77955026459
SN - 1467-5463
VL - 11
SP - 385
EP - 393
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 4
M1 - bbp070
ER -