TY - GEN
T1 - BCGAN-based over-sampling scheme for imbalanced data
AU - Son, Minjae
AU - Jung, Seungwon
AU - Moon, Jihoon
AU - Hwang, Eenjun
N1 - Funding Information:
ACKNOWLEDGMENT This research was supported in part by Energy Cloud R&D Program (grant number: 2019M3F2A1073184) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT and in part by the Korea Electric Power Corporation (grant number: R18XA05).
PY - 2020/2/1
Y1 - 2020/2/1
N2 - Classification is a process of identifying the class to which input data belong. One of the most popular methods to do this is to construct a classification model by training a machine learning algorithm using a given set of data. For better classification performance, the dataset should have a balanced data distribution by class. If the dataset is imbalanced, that is, one class (minority class) has very fewer data than the other class (majority class); a model has little chance to learn about the minority class, and training is biased to the majority class. As a result, the model tends to classify any input to the majority class and does not handle data of the minority class properly. To overcome this data imbalance problem, we propose a novel over-sampling scheme based on Borderline-Conditional Generative Adversarial Networks (BCGAN). Our BCGAN generates data for the minority class, particularly along the borderline between majority class and minority class. Through various experiments on actual imbalanced datasets, we show the performance of our scheme.
AB - Classification is a process of identifying the class to which input data belong. One of the most popular methods to do this is to construct a classification model by training a machine learning algorithm using a given set of data. For better classification performance, the dataset should have a balanced data distribution by class. If the dataset is imbalanced, that is, one class (minority class) has very fewer data than the other class (majority class); a model has little chance to learn about the minority class, and training is biased to the majority class. As a result, the model tends to classify any input to the majority class and does not handle data of the minority class properly. To overcome this data imbalance problem, we propose a novel over-sampling scheme based on Borderline-Conditional Generative Adversarial Networks (BCGAN). Our BCGAN generates data for the minority class, particularly along the borderline between majority class and minority class. Through various experiments on actual imbalanced datasets, we show the performance of our scheme.
KW - BCGAN
KW - CGAN
KW - Imbalanced data
KW - Over-sampling
UR - http://www.scopus.com/inward/record.url?scp=85084366725&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084366725&partnerID=8YFLogxK
U2 - 10.1109/BigComp48618.2020.00-83
DO - 10.1109/BigComp48618.2020.00-83
M3 - Conference contribution
AN - SCOPUS:85084366725
T3 - Proceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
SP - 155
EP - 160
BT - Proceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
A2 - Lee, Wookey
A2 - Chen, Luonan
A2 - Moon, Yang-Sae
A2 - Bourgeois, Julien
A2 - Bennis, Mehdi
A2 - Li, Yu-Feng
A2 - Ha, Young-Guk
A2 - Kwon, Hyuk-Yoon
A2 - Cuzzocrea, Alfredo
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
Y2 - 19 February 2020 through 22 February 2020
ER -