Abstract
In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
Original language | English |
---|---|
Pages (from-to) | 1873-1876 |
Number of pages | 4 |
Journal | IEICE Transactions on Information and Systems |
Volume | E90-D |
Issue number | 11 |
DOIs | |
Publication status | Published - 2007 Nov |
Keywords
- Biomedical document retrieval
- Biomedical terminology
- Biomedical terminology weighting
- Query expansion
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering
- Artificial Intelligence