Load balanced frequent pattern mining

Doosan Cho, Dongseung Kim

Research output: Contribution to journalConference articlepeer-review

Abstract

Data mining is an effective method of the discovery of useful information such as association rules and previously unknown patterns existing in large databases. We have developed a distributed frequent-pattern mining algorithm with distributed FP (frequent pattern) trees on a networked computing cluster. The algorithm parellelizes FP-growth algorithm, generates local FP trees independently, and partitions the conditional database to all processors so as to have equal load for mining computation. Performance is enhanced by avoiding the construction and broadcast of the global FP tree, and by utilizing the computational power efficiently with even work load distribution. The improvement of the algorithm is experimentally observed on a Linux cluster over the count distribution algorithm, one of the best-known parallel algorithms for association mining.

Original languageEnglish
Article number439-040
Pages (from-to)614-619
Number of pages6
JournalProceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems
Volume16
Publication statusPublished - 2004
EventProceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems - Cambridge, MA, United States
Duration: 2004 Nov 92004 Nov 11

Keywords

  • Frequent pattern tree
  • Load balancing
  • Parallel data mining

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Load balanced frequent pattern mining'. Together they form a unique fingerprint.

Cite this