Automated classification of industry and occupation codes using document classification method

    Research output: Chapter in Book/Report/Conference proceedingChapter

    Abstract

    This paper describes development of the automated industry and occupation coding system for the Korean Census records. The purpose of the system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. We employ kNN(k Nearest Neighbors)-based document classification method and information retrieval techniques to index and to weight index terms. In order to solve the description inconsistency of many respondents, we use nouns and phrases acquired from past census data. Using the data, we could estimate the nouns or phrases frequently used to describe a certain code. The Experimental results show that the past census data plays an important role in increasing code classification accuracy.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    EditorsNikhil R. Pal, Srimanta Pal, Nikola Kasabov, Rajani K. Mudi, Swapan K. Parui
    PublisherSpringer Verlag
    Pages827-833
    Number of pages7
    ISBN (Print)3540239316, 9783540239314
    DOIs
    Publication statusPublished - 2004

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3316
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Automated classification of industry and occupation codes using document classification method'. Together they form a unique fingerprint.

    Cite this