TsPhraseRank for document clustering: Reweighting the weight of phrase

Yoon Ho Cho, Sang Hyun Park, Sang-Geun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.

    Original languageEnglish
    Title of host publicationProceedings of 2nd International Conference on Interaction Sciences
    Subtitle of host publicationInformation Technology, Culture and Human
    Pages168-174
    Number of pages7
    DOIs
    Publication statusPublished - 2009
    Event2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009 - Seoul, Korea, Republic of
    Duration: 2009 Nov 242009 Nov 26

    Publication series

    NameACM International Conference Proceeding Series
    Volume403

    Other

    Other2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS 2009
    Country/TerritoryKorea, Republic of
    CitySeoul
    Period09/11/2409/11/26

    Keywords

    • Document model
    • Overlap phrases
    • Reweighting

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'TsPhraseRank for document clustering: Reweighting the weight of phrase'. Together they form a unique fingerprint.

    Cite this