Identify Breast Cancer Subtypes by Gene Expression Profiles

Grace S. Shieh, Chyi-Huey Bai, Chih Lee

研究成果: 雜誌貢獻文章同行評審


Support vector machines (SVMs), with linear, polynomial and
radial kernels, were applied to classify subtypes of breast cancer by gene
expression profiles of tissues samples. Using the top 500 genes ranked by
between-group to within-group sum of squares, SVMs with linear kernel had
an average accuracy rate about 97% when applied to a balanced dataset; this
accuracy rate was significantly higher than that of the original data. After
imputation, the smallest subsample of the balanced dataset was comparable
to the other subsamples’ (containing more than 10 samples). In biomedical
sciences, it is of interest to identify genes that can be used to classify subtypes
of breast cancer well. Using SVMs, we identified 500 genes and looked up
the functions of 297 genes from databases. Furthermore, about 65% of these
297 genes were known to be related to breast cancer, and this confirms the
consistency of our results with existing biomedical knowledge. Those 203
genes may also be investigated further to see if they are involved in breast
cancer; any novel findings will be important.
頁(從 - 到)165-175
期刊journal of data science
出版狀態已發佈 - 2004


  • Classification
  • microarray gene expression data
  • support vector machines
  • tumor