Abstract
Data mining is an effective method of the discovery of useful information such as association rules and previously unknown patterns existing in large databases. We have developed a distributed frequent-pattern mining algorithm with distributed FP (frequent pattern) trees on a networked computing cluster. The algorithm parellelizes FP-growth algorithm, generates local FP trees independently, and partitions the conditional database to all processors so as to have equal load for mining computation. Performance is enhanced by avoiding the construction and broadcast of the global FP tree, and by utilizing the computational power efficiently with even work load distribution. The improvement of the algorithm is experimentally observed on a Linux cluster over the count distribution algorithm, one of the best-known parallel algorithms for association mining.
Original language | English |
---|---|
Article number | 439-040 |
Pages (from-to) | 614-619 |
Number of pages | 6 |
Journal | Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems |
Volume | 16 |
Publication status | Published - 2004 |
Event | Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems - Cambridge, MA, United States Duration: 2004 Nov 9 → 2004 Nov 11 |
Keywords
- Frequent pattern tree
- Load balancing
- Parallel data mining
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications