Automatic word spacing using probabilistic models based on character n-grams

Do Gil Lee, Hae Chang Rim, Dongsuk Yook

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)

Abstract

Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.

Original languageEnglish
Pages (from-to)28-35
Number of pages8
JournalIEEE Intelligent Systems
Volume22
Issue number1
DOIs
Publication statusPublished - 2007 Jan

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic word spacing using probabilistic models based on character n-grams'. Together they form a unique fingerprint.

Cite this