Abstract
It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.
Original language | English |
---|---|
Pages (from-to) | 2582-2595 |
Number of pages | 14 |
Journal | Journal of Statistical Computation and Simulation |
Volume | 85 |
Issue number | 13 |
DOIs | |
Publication status | Published - Sept 2 2015 |
Externally published | Yes |
Keywords
- composite criterion
- group lasso
- imbalanced data
- true positive rate
ASJC Scopus subject areas
- Statistics and Probability
- Modelling and Simulation
- Statistics, Probability and Uncertainty
- Applied Mathematics