Alleviating syntactic term mismatches in Korean text retrieval

Bo Hyun Yun, Yong Jae Kwak, Hae Chang Rim

Research output: Contribution to journalComment/debatepeer-review


In Korean information retrieval, syntactic term mismatches between index terms and query terms have been a serious obstacle to the enhancement of retrieval performance. Conventional approaches try to alleviate syntactic term mismatches either by segmenting compound nouns or by normalizing different representation of noun phrases. However, using only the segmentation may cause similarity measurements to increase unnecessarily since the segmented unit nouns can't discriminate different formations of compound nouns. On the other hand, using only the normalization has a limit in alleviating syntactic term mismatches because of the specificity of normalized phrases. In this paper, we propose a Korean information retrieval system which can alleviate syntactic term mismatches by segmenting compound nouns as well as by normalizing noun phrases, and which can provide appropriate similarity measurements. In the indexing module, we segment compound nouns by statistical information and normalize noun phrases by dependency relations. Then, we extract terms attached with boundary information. Finally, terms are weighted by a newly devised weighting scheme appropriate for Korean noun phrases. In the retrieval module, we compute the similarity considering partial matching by using boundary information. The experimental results show that the proposed method can alleviate syntactic term mismatches and improve the precision without decreasing the recall.

Original languageEnglish
Pages (from-to)481-500
Number of pages20
JournalInformation Processing and Management
Issue number4
Publication statusPublished - 1999

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences


Dive into the research topics of 'Alleviating syntactic term mismatches in Korean text retrieval'. Together they form a unique fingerprint.

Cite this