TY - GEN
T1 - Automatic spelling correction rule extraction and application for spoken-style Korean text
AU - Byun, Jeung Hyun
AU - Rim, Hae Chang
AU - Park, So Young
PY - 2007
Y1 - 2007
N2 - Nowadays, spoken-style text is prevailing because lots of information are being written in spoken-style such as Short-Message-Service(SMS) messages. However, the spoken-style text contains more spelling errors than the traditional written-style text. In this paper, we propose a rule-based spelling correction system which can automatically extract spelling correction rules from the correction corpus and apply extracted rules to spelling errors of input sentences. In order to preserve both high precision and high recall, we devise a candidate-elimination algorithm which determines appropriate context size of spelling correction rules based on rule accuracy. Experimental results showed that the proposed system can extract 42,537 spelling correction rules and apply the rules to correct spelling errors on the test corpus and thus, the rate of precision is increased from 31.08% to 79.04% on the basis of message unit.
AB - Nowadays, spoken-style text is prevailing because lots of information are being written in spoken-style such as Short-Message-Service(SMS) messages. However, the spoken-style text contains more spelling errors than the traditional written-style text. In this paper, we propose a rule-based spelling correction system which can automatically extract spelling correction rules from the correction corpus and apply extracted rules to spelling errors of input sentences. In order to preserve both high precision and high recall, we devise a candidate-elimination algorithm which determines appropriate context size of spelling correction rules based on rule accuracy. Experimental results showed that the proposed system can extract 42,537 spelling correction rules and apply the rules to correct spelling errors on the test corpus and thus, the rate of precision is increased from 31.08% to 79.04% on the basis of message unit.
UR - http://www.scopus.com/inward/record.url?scp=50049114166&partnerID=8YFLogxK
U2 - 10.1109/ALPIT.2007.102
DO - 10.1109/ALPIT.2007.102
M3 - Conference contribution
AN - SCOPUS:50049114166
SN - 0769529305
SN - 9780769529301
T3 - Proceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology
SP - 195
EP - 199
BT - Proceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology
T2 - 6th International Conference on Advanced Language Processing and Web Information Technology, ALPIT 2007
Y2 - 22 August 2007 through 24 August 2007
ER -