Abstract
In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005 |
Publisher | Association for Computing Machinery, Inc |
Pages | 69-76 |
Number of pages | 8 |
ISBN (Electronic) | 1595931600, 9781595931603 |
DOIs | |
Publication status | Published - 2005 Jun 17 |
Externally published | Yes |
Event | 2nd International Workshop on Information Quality in Information Systems, IQIS 2005 - Baltimore, United States Duration: 2005 Jun 17 → … |
Publication series
Name | Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005 |
---|
Other
Other | 2nd International Workshop on Information Quality in Information Systems, IQIS 2005 |
---|---|
Country/Territory | United States |
City | Baltimore |
Period | 05/6/17 → … |
Bibliographical note
Publisher Copyright:© 2005 ACM.
ASJC Scopus subject areas
- Information Systems