TY - GEN
T1 - Effective and scalable solutions for mixed and split citation problems in digital libraries
AU - Lee, Dongwon
AU - On, Byung Won
AU - Kang, Jaewoo
AU - Park, Sanghyun
N1 - Publisher Copyright:
© 2005 ACM.
PY - 2005/6/17
Y1 - 2005/6/17
N2 - In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.
AB - In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.
UR - http://www.scopus.com/inward/record.url?scp=85019781759&partnerID=8YFLogxK
U2 - 10.1145/1077501.1077514
DO - 10.1145/1077501.1077514
M3 - Conference contribution
AN - SCOPUS:85019781759
T3 - Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005
SP - 69
EP - 76
BT - Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005
PB - Association for Computing Machinery, Inc
T2 - 2nd International Workshop on Information Quality in Information Systems, IQIS 2005
Y2 - 17 June 2005
ER -