TY - JOUR
T1 - Ensemble variable selection using genetic algorithm
AU - Lee, Seogyoung
AU - Yang, Martin Seunghwan
AU - Kang, Jongkyeong
AU - Shin, Seung Jun
N1 - Funding Information:
This work is funded by the National Research Foundation of Korea (NRF) grants (2018R1D1A1B070 43034, 2019R1A4A1028134) and Korea University (K2000461).
Publisher Copyright:
© 2022 The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data.
AB - Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data.
KW - Cox regression
KW - Ensemble learning
KW - Generalized linear model
KW - Genetic algorithm
UR - http://www.scopus.com/inward/record.url?scp=85143851699&partnerID=8YFLogxK
U2 - 10.29220/CSAM.2022.29.6.629
DO - 10.29220/CSAM.2022.29.6.629
M3 - Article
AN - SCOPUS:85143851699
SN - 2287-7843
VL - 29
SP - 629
EP - 640
JO - Communications for Statistical Applications and Methods
JF - Communications for Statistical Applications and Methods
IS - 6
ER -