TY - GEN
T1 - Model Selection for Data Analysis in Encrypted Domain
T2 - 20th World Conference on Information Security Applications, WISA 2019
AU - Hong, Mi Yeon
AU - Yoon, Ji Won
N1 - Funding Information:
This research is supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the IITP support program (2017-0-00545). We thank Joonsoo Yoo and Jeonghwan Hwang for their assistance in this research.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - In the big data era, data scientists explore machine learning methods for observed data to predict or classify. For machine learining to be effective, it requires access to raw data which is often privacy sensitive. In addition, whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model from the given dataset. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction. To address this issue, we develop new techniques to provide solutions for running model selection over encrypted data. Our approach provides the best approximation of the relationship between the dependent and independent variable through cross validation. After performing 4-fold cross validation, 4 different estimates of our model’s errors are calculated. And then we use bias and variance extracted from these errors to find the best model. We perform an experiment on a dataset extracted from Kaggle and show that our approach can homomorphically regress a given encrypted data without decrypting it.
AB - In the big data era, data scientists explore machine learning methods for observed data to predict or classify. For machine learining to be effective, it requires access to raw data which is often privacy sensitive. In addition, whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model from the given dataset. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction. To address this issue, we develop new techniques to provide solutions for running model selection over encrypted data. Our approach provides the best approximation of the relationship between the dependent and independent variable through cross validation. After performing 4-fold cross validation, 4 different estimates of our model’s errors are calculated. And then we use bias and variance extracted from these errors to find the best model. We perform an experiment on a dataset extracted from Kaggle and show that our approach can homomorphically regress a given encrypted data without decrypting it.
KW - Fully Homomorphic Encryption
KW - Model selection
KW - TFHE
UR - http://www.scopus.com/inward/record.url?scp=85079094424&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-39303-8_12
DO - 10.1007/978-3-030-39303-8_12
M3 - Conference contribution
AN - SCOPUS:85079094424
SN - 9783030393021
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 155
EP - 166
BT - Information Security Applications - 20th International Conference, WISA 2019, Revised Selected Papers
A2 - You, Ilsun
PB - Springer
Y2 - 21 August 2019 through 24 August 2019
ER -