K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics

Jeffrey J. Holliday, Rory Turnbull, Julien Eychenne

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.

Original languageEnglish
Pages (from-to)1939-1950
Number of pages12
JournalBehavior Research Methods
Issue number5
Publication statusPublished - 2017 Oct 1


  • Korean
  • Lexical database
  • Lexicon
  • Phonological neighborhood density

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • Psychology(all)


Dive into the research topics of 'K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics'. Together they form a unique fingerprint.

Cite this