Ensemble variable selection using genetic algorithm

Seogyoung Lee, Martin Seunghwan Yang, Jongkyeong Kang, Seung Jun Shin

Research output: Contribution to journalArticlepeer-review

Abstract

Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data.

Original languageEnglish
Pages (from-to)629-640
Number of pages12
JournalCommunications for Statistical Applications and Methods
Volume29
Issue number6
DOIs
Publication statusPublished - 2022

Bibliographical note

Funding Information:
This work is funded by the National Research Foundation of Korea (NRF) grants (2018R1D1A1B070 43034, 2019R1A4A1028134) and Korea University (K2000461).

Publisher Copyright:
© 2022 The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.

Keywords

  • Cox regression
  • Ensemble learning
  • Generalized linear model
  • Genetic algorithm

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation
  • Finance
  • Statistics, Probability and Uncertainty
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Ensemble variable selection using genetic algorithm'. Together they form a unique fingerprint.

Cite this