Contextual postprocessing of a Korean OCR system by linguistic constraints

Hyuk Chul Kwon, Ho Jeong Hwang, Min Jung Kim, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%.

Original languageEnglish
Title of host publicationProceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995
PublisherIEEE Computer Society
Pages557-562
Number of pages6
ISBN (Electronic)0818671289
DOIs
Publication statusPublished - 1995
Event3rd International Conference on Document Analysis and Recognition, ICDAR 1995 - Montreal, Canada
Duration: 1995 Aug 141995 Aug 16

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2
ISSN (Print)1520-5363

Conference

Conference3rd International Conference on Document Analysis and Recognition, ICDAR 1995
Country/TerritoryCanada
CityMontreal
Period95/8/1495/8/16

Keywords

  • confusion matrix
  • distance evaluation function
  • heuristics
  • postprocessing
  • syllable di-grams
  • viable-prefrres

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Contextual postprocessing of a Korean OCR system by linguistic constraints'. Together they form a unique fingerprint.

Cite this