TY - JOUR
T1 - Biomarker Detection in Association Studies
T2 - Modeling SNPs Simultaneously via Logistic ANOVA
AU - Jung, Yoonsuh
AU - Huang, Jianhua Z.
AU - Hu, Jianhua
N1 - Funding Information:
Yoonsuh Jung is Lecturer, Department of Statistics, Univerisity of Waikato, Private Bag 3105, Hamilton 3240, New Zealand (E-mail: yoonsuh@waikato.ac.nz). Jianhua Z. Huang is Professor, Department of Statistics, Texas A&M University, College Station, TX, and Special Term Professor at ISEM, Captial University of Economics and Business, Beijing, China (E-mail: jianhua@stat.tamu.edu). Jianhua Hu is Corresponding Author, and Associate Professor, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX (E-mail: jhu@mdanderson.org). Hu’s work was partially supported by the National Institute of Health Grants R21CA129671, R01GM080503, R01CA158113, and CGSG P30 CA016672. Huang’s work was partially supported by grants from NSF (DMS-0907170, DMS-1007618, DMS-1208952), and Award Number KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). The authors thank the editor, the associate editor, and reviewers for many constructive comments.
Publisher Copyright:
© 2014, © 2014 American Statistical Association.
PY - 2014/10/2
Y1 - 2014/10/2
N2 - In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.
AB - In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.
KW - BIC
KW - GWAS
KW - L-penalty
KW - MM algorithm
KW - Penalized Bernoulli likelihood
KW - Simultaneous modeling of SNPs
UR - http://www.scopus.com/inward/record.url?scp=84919796818&partnerID=8YFLogxK
U2 - 10.1080/01621459.2014.928217
DO - 10.1080/01621459.2014.928217
M3 - Article
AN - SCOPUS:84919796818
SN - 0162-1459
VL - 109
SP - 1355
EP - 1367
JO - Quarterly Publications of the American Statistical Association
JF - Quarterly Publications of the American Statistical Association
IS - 508
ER -