Building a large-scale commonsense knowledge base by converting an existing one in a different language

Yuchul Jung, Joo Young Lee, Youngho Kim, Jaehyun Park, Sung Hyon Myaeng, Hae Chang Rim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
PublisherSpringer Verlag
Pages23-34
Number of pages12
ISBN (Print)354070938X, 9783540709381
DOIs
Publication statusPublished - 2007
Event8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007 - Mexico City, Mexico
Duration: 2007 Feb 182007 Feb 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4394 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Country/TerritoryMexico
CityMexico City
Period07/2/1807/2/24

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Building a large-scale commonsense knowledge base by converting an existing one in a different language'. Together they form a unique fingerprint.

Cite this