TY - GEN
T1 - Smoothing algorithm for n-gram model using agglutinative characteristic of korean
AU - Park, Jae Hyun
AU - Song, Young In
AU - Rim, Hae Chang
PY - 2007
Y1 - 2007
N2 - Smoothing for an n-gram language model is an algorithm that can assign a non-zero probability to an unseen n-gram. Smoothing is an essential technique for an n-gram language model due to the data sparseness problem. However, in some circumstances it assigns an improper amount of probability to unseen n-grams. In this paper, we present a novel method that adjusts the improperly assigned probabilities of unseen n-grams by taking advantage of the agglutinative characteristics of Korean language. In Korean, the grammatically proper class of a morpheme can be predicted by knowing the previous morpheme. By using this characteristic, we try to prevent grammatically improper n-grams from achieving relatively higher probability and to assign more probability mass to proper n-grams. Experimental results show that the proposed method can achieve 8.6% - 12.5% perplexity reductions for Katz backoff algorithm and 4.9% - 7.0% perplexity reductions for Kneser-Ney Smoothing.
AB - Smoothing for an n-gram language model is an algorithm that can assign a non-zero probability to an unseen n-gram. Smoothing is an essential technique for an n-gram language model due to the data sparseness problem. However, in some circumstances it assigns an improper amount of probability to unseen n-grams. In this paper, we present a novel method that adjusts the improperly assigned probabilities of unseen n-grams by taking advantage of the agglutinative characteristics of Korean language. In Korean, the grammatically proper class of a morpheme can be predicted by knowing the previous morpheme. By using this characteristic, we try to prevent grammatically improper n-grams from achieving relatively higher probability and to assign more probability mass to proper n-grams. Experimental results show that the proposed method can achieve 8.6% - 12.5% perplexity reductions for Katz backoff algorithm and 4.9% - 7.0% perplexity reductions for Kneser-Ney Smoothing.
UR - http://www.scopus.com/inward/record.url?scp=47749151091&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2007.66
DO - 10.1109/ICSC.2007.66
M3 - Conference contribution
AN - SCOPUS:47749151091
SN - 0769529976
SN - 9780769529974
T3 - ICSC 2007 International Conference on Semantic Computing
SP - 397
EP - 404
BT - ICSC 2007 International Conference on Semantic Computing
T2 - ICSC 2007 International Conference on Semantic Computing
Y2 - 17 September 2007 through 19 September 2007
ER -