TsPhraseRank for document clustering: Reweighting the weight of phrase

Yoon Ho Cho, Sang Hyun Park, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

Original languageEnglish
Title of host publicationProceedings of 2nd International Conference on Interaction Sciences
Subtitle of host publicationInformation Technology, Culture and Human
Pages168-174
Number of pages7
DOIs
Publication statusPublished - 2009
Event2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009 - Seoul, Korea, Republic of
Duration: 2009 Nov 242009 Nov 26

Publication series

NameACM International Conference Proceeding Series
Volume403

Other

Other2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
Country/TerritoryKorea, Republic of
CitySeoul
Period09/11/2409/11/26

Keywords

  • Document model
  • Overlap phrases
  • Reweighting

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'TsPhraseRank for document clustering: Reweighting the weight of phrase'. Together they form a unique fingerprint.

Cite this