Abstract
This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.
Original language | English |
---|---|
Pages (from-to) | 2229-2241 |
Number of pages | 13 |
Journal | Information |
Volume | 14 |
Issue number | 7 |
Publication status | Published - 2011 Jul |
Keywords
- Language learning
- Lexical acquisition
- Machine readable dictionary
- Mental lexicon
ASJC Scopus subject areas
- Information Systems