TY - GEN
T1 - A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining
AU - Lee, Sunwon
AU - Kang, Jaewoo
AU - Oh, Junho
PY - 2010
Y1 - 2010
N2 - The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.
AB - The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.
UR - http://www.scopus.com/inward/record.url?scp=79952018830&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952018830&partnerID=8YFLogxK
U2 - 10.1109/BIBMW.2010.5703821
DO - 10.1109/BIBMW.2010.5703821
M3 - Conference contribution
AN - SCOPUS:79952018830
SN - 9781424483044
T3 - 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
SP - 318
EP - 323
BT - 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
Y2 - 18 December 2010 through 21 December 2010
ER -