Alleviating syntactic term mismatches in Korean text retrieval

  • Bo Hyun Yun*
  • , Yong Jae Kwak
  • , Hae Chang Rim
  • *Corresponding author for this work

    Research output: Contribution to journalComment/debatepeer-review

    Abstract

    In Korean information retrieval, syntactic term mismatches between index terms and query terms have been a serious obstacle to the enhancement of retrieval performance. Conventional approaches try to alleviate syntactic term mismatches either by segmenting compound nouns or by normalizing different representation of noun phrases. However, using only the segmentation may cause similarity measurements to increase unnecessarily since the segmented unit nouns can't discriminate different formations of compound nouns. On the other hand, using only the normalization has a limit in alleviating syntactic term mismatches because of the specificity of normalized phrases. In this paper, we propose a Korean information retrieval system which can alleviate syntactic term mismatches by segmenting compound nouns as well as by normalizing noun phrases, and which can provide appropriate similarity measurements. In the indexing module, we segment compound nouns by statistical information and normalize noun phrases by dependency relations. Then, we extract terms attached with boundary information. Finally, terms are weighted by a newly devised weighting scheme appropriate for Korean noun phrases. In the retrieval module, we compute the similarity considering partial matching by using boundary information. The experimental results show that the proposed method can alleviate syntactic term mismatches and improve the precision without decreasing the recall.

    Original languageEnglish
    Pages (from-to)481-500
    Number of pages20
    JournalInformation Processing and Management
    Volume35
    Issue number4
    DOIs
    Publication statusPublished - 1999

    ASJC Scopus subject areas

    • Information Systems
    • Media Technology
    • Computer Science Applications
    • Management Science and Operations Research
    • Library and Information Sciences

    Fingerprint

    Dive into the research topics of 'Alleviating syntactic term mismatches in Korean text retrieval'. Together they form a unique fingerprint.

    Cite this