TY - JOUR
T1 - Semi-supervised support vector regression based on self-training with label uncertainty
T2 - An application to virtual metrology in semiconductor manufacturing
AU - Kang, Pilsung
AU - Kim, Dongil
AU - Cho, Sungzoon
N1 - Funding Information:
This work was supported by Basic Science Research Program through the National Research Foundation of Korea, South Korea (NRF) funded by the Ministry of Science, ICT, & Future Planning (NRF-2014R1A1A1004648).
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.
AB - Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.
KW - Data generation
KW - Probabilistic local reconstruction
KW - Semi-supervised learning
KW - Semiconductor manufacturing
KW - Support vector regression
KW - Virtual metrology
UR - http://www.scopus.com/inward/record.url?scp=84955137019&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2015.12.027
DO - 10.1016/j.eswa.2015.12.027
M3 - Article
AN - SCOPUS:84955137019
SN - 0957-4174
VL - 51
SP - 85
EP - 106
JO - Expert Systems with Applications
JF - Expert Systems with Applications
ER -