TY - JOUR
T1 - Oversampling method using outlier detectable generative adversarial network
AU - Oh, Joo Hyuk
AU - Hong, Jae Yeol
AU - Baek, Jun Geol
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) ( NRF-2019R1A2C2005949 ). This work was also supported by the BK21 Plus program (Big Data in Manufacturing and Logistics Systems, Korea University ) and by Samsung Electronics Co., Ltd.
Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1A2C2005949). This work was also supported by the BK21 Plus program (Big Data in Manufacturing and Logistics Systems, Korea University) and by Samsung Electronics Co. Ltd.
Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/11/1
Y1 - 2019/11/1
N2 - A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.
AB - A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.
KW - Class imbalance problem
KW - Generative adversarial network
KW - Outlier detection
KW - Oversampling
UR - http://www.scopus.com/inward/record.url?scp=85065580340&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.05.006
DO - 10.1016/j.eswa.2019.05.006
M3 - Article
AN - SCOPUS:85065580340
SN - 0957-4174
VL - 133
SP - 1
EP - 8
JO - Expert Systems with Applications
JF - Expert Systems with Applications
ER -