TY - JOUR
T1 - Schema matching using interattribute dependencies
AU - Kang, Jaewoo
AU - Naughton, Jeffrey F.
N1 - Funding Information:
Correspondence should be addressed to Jaewoo Kang. The authors would like to thank Jin-Yi Cai for helpful discussion. This paper is a substantially extended version of the authors’ previous conference paper in [36]. This research was supported by US National Science Foundation Grants CSA-9623632, ITR 0086002, and in part by Korea University Grant, Microsoft Research Internet Services Grant, and the Second Brain Korea 21 Project Grant.
PY - 2008/10
Y1 - 2008/10
N2 - Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schémas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.
AB - Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schémas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.
KW - Attribute dependency
KW - Graph matching
KW - Schema matching
UR - http://www.scopus.com/inward/record.url?scp=50649089608&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2008.100
DO - 10.1109/TKDE.2008.100
M3 - Article
AN - SCOPUS:50649089608
SN - 1041-4347
VL - 20
SP - 1393
EP - 1407
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
M1 - 4527243
ER -