Unsupervised lexical entry acquisition model based on representation of human mental lexicon

Wonhee Yu, Doo Soon Park, Taeweon Suh, Heuiseok Lim

    Research output: Contribution to journalArticlepeer-review

    1 Citation (Scopus)

    Abstract

    This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.

    Original languageEnglish
    Pages (from-to)2229-2241
    Number of pages13
    JournalInformation
    Volume14
    Issue number7
    Publication statusPublished - 2011 Jul

    Keywords

    • Language learning
    • Lexical acquisition
    • Machine readable dictionary
    • Mental lexicon

    ASJC Scopus subject areas

    • Information Systems

    Fingerprint

    Dive into the research topics of 'Unsupervised lexical entry acquisition model based on representation of human mental lexicon'. Together they form a unique fingerprint.

    Cite this