Abstract
We propose a method to select the subset of features in multiple linear regression models that considers the collinearity between features. The proposed method first detects collinear groups of features and then uses collinear groupwise feature selection constraints to estimate the coefficients of the regression model. The constraints simultaneously control the number of features selected and predefined collinear feature groups. We manage the multicollinearity in the regression model by controlling the parameters of the fusion group constraint. To address the NP-hard problem of the proposed method, we propose a modified discrete first-order algorithm. We use simulation and real-world data to demonstrate the usefulness of the proposed method by comparing it to existing regularization and discrete optimization-based methods in terms of predictive accuracy, bias, and variance. The comparison confirms that the proposed method outperforms the alternatives.
Original language | English |
---|---|
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | Pattern Recognition |
Volume | 83 |
DOIs | |
Publication status | Published - 2018 Nov |
Keywords
- Best subset selection
- Feature selection
- Machine learning
- Mixed-integer quadratic programming
- Multicollinearity
- Multiple linear regression
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence