Toward robust classification using the Open Directory Project

Jongwoo Ha, Jung Hyun Lee, Won Jun Jang, Yong Ku Lee, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory utilized in many studies and real-world applications. In this paper, we explore training data expansion techniques for text classification as one of the possible directions to deal with the sparse characteristic of the ODP dataset. We propose a dozen classification methods, which can be differentiated by (1) from which categories training data is expanded, and (2) how the expanded training data is merged to generate centroid vectors. Evaluation results show that training data expansion significantly improves the classification performance more than representative classifiers. We also find that (1) child and descendant categories are more valuable sources to expand training data than parent and ancestor categories, and (2) distance-based weighting is superior to simple averaging to merge the expanded training data.

Original languageEnglish
Title of host publicationDSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics
EditorsGeorge Karypis, Longbing Cao, Wei Wang, Irwin King
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages607-612
Number of pages6
ISBN (Electronic)9781479969913
DOIs
Publication statusPublished - 2014 Mar 10
Event2014 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2014 - Shanghai, China
Duration: 2014 Oct 302014 Nov 1

Publication series

NameDSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics

Other

Other2014 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2014
Country/TerritoryChina
CityShanghai
Period14/10/3014/11/1

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Toward robust classification using the Open Directory Project'. Together they form a unique fingerprint.

Cite this