TY - JOUR
T1 - Protein subcellular localization prediction based on compartment-specific biological features.
AU - Su, Chia Yu
AU - Lo, Allan
AU - Chiu, Hua Sheng
AU - Sung, Ting Yi
AU - Hsu, Wen Lian
PY - 2006
Y1 - 2006
N2 - Prediction of subcellular localization of proteins is important for genome annotation, protein function prediction, and drug discovery. We present a prediction method for Gram-negative bacteria that uses ten one-versus-one support vector machine (SVM) classifiers, where compartment-specific biological features are selected as input to each SVM classifier. The final prediction of localization sites is determined by integrating the results from ten binary classifiers using a combination of majority votes and a probabilistic method. The overall accuracy reaches 91.4%, which is 1.6% better than the state-of-the-art system, in a ten-fold cross-validation evaluation on a benchmark data set. We demonstrate that feature selection guided by biological knowledge and insights in one-versus-one SVM classifiers can lead to a significant improvement in the prediction performance. Our model is also used to produce highly accurate prediction of 92.8% overall accuracy for proteins of dual localizations.
AB - Prediction of subcellular localization of proteins is important for genome annotation, protein function prediction, and drug discovery. We present a prediction method for Gram-negative bacteria that uses ten one-versus-one support vector machine (SVM) classifiers, where compartment-specific biological features are selected as input to each SVM classifier. The final prediction of localization sites is determined by integrating the results from ten binary classifiers using a combination of majority votes and a probabilistic method. The overall accuracy reaches 91.4%, which is 1.6% better than the state-of-the-art system, in a ten-fold cross-validation evaluation on a benchmark data set. We demonstrate that feature selection guided by biological knowledge and insights in one-versus-one SVM classifiers can lead to a significant improvement in the prediction performance. Our model is also used to produce highly accurate prediction of 92.8% overall accuracy for proteins of dual localizations.
UR - http://www.scopus.com/inward/record.url?scp=34250876252&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250876252&partnerID=8YFLogxK
U2 - 10.1142/1860947573_0041
DO - 10.1142/1860947573_0041
M3 - Article
C2 - 17369650
AN - SCOPUS:34250876252
SN - 1752-7791
SP - 325
EP - 330
JO - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
JF - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
ER -