Utilizing theweb for automatic word spacing

Gumwon Hong, Jeong Hoon Lee, Young In Song, Do Gil Lee, Hae Chang Rim*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noisefree data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

    Original languageEnglish
    Pages (from-to)2553-2556
    Number of pages4
    JournalIEICE Transactions on Information and Systems
    VolumeE92-D
    Issue number12
    DOIs
    Publication statusPublished - 2009

    Keywords

    • Word segmentation
    • Word spacing

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Vision and Pattern Recognition
    • Electrical and Electronic Engineering
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Utilizing theweb for automatic word spacing'. Together they form a unique fingerprint.

    Cite this