TY - GEN
T1 - Building a large-scale commonsense knowledge base by converting an existing one in a different language
AU - Jung, Yuchul
AU - Lee, Joo Young
AU - Kim, Youngho
AU - Park, Jaehyun
AU - Myaeng, Sung Hyon
AU - Rim, Hae Chang
PY - 2007
Y1 - 2007
N2 - This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.
AB - This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.
UR - http://www.scopus.com/inward/record.url?scp=37149021053&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-70939-8_3
DO - 10.1007/978-3-540-70939-8_3
M3 - Conference contribution
AN - SCOPUS:37149021053
SN - 354070938X
SN - 9783540709381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 23
EP - 34
BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
PB - Springer Verlag
T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Y2 - 18 February 2007 through 24 February 2007
ER -