TY - JOUR
T1 - Regression-Based Network Estimation for High-Dimensional Genetic Data
AU - Lee, Kyu Min
AU - Lee, Minhyeok
AU - Seok, Junhee
AU - Han, Sung Won
N1 - Funding Information:
This research was supported by grants from the National Research Foundation of Korea (NRF-2017R1E1A1A03070507 and NRF-2017R1C1B2002850) and Korea University (K1719881 and K1822881). This article contains a portion of the MS thesis compiled by Kyu Min Lee, which followed the policy and guidelines of Korea University. Copyright is held by the Journal of Computational Biology.
Publisher Copyright:
© Copyright 2019, Mary Ann Liebert, Inc., publishers 2019.
PY - 2019/4
Y1 - 2019/4
N2 - Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.
AB - Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.
KW - adaptive elastic-net
KW - gene network estimation
KW - graphical model
KW - regression-based approach.
UR - http://www.scopus.com/inward/record.url?scp=85064082830&partnerID=8YFLogxK
U2 - 10.1089/cmb.2018.0225
DO - 10.1089/cmb.2018.0225
M3 - Article
C2 - 30653343
AN - SCOPUS:85064082830
SN - 1066-5277
VL - 26
SP - 336
EP - 349
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 4
ER -