Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.
|Number of pages
|KSII Transactions on Internet and Information Systems
|Published - 2022 Jan 31
Bibliographical noteFunding Information:
This study is supported by Korea University Grant and in part by the Soonchunhyang University Research Fund 20180051.
Copyright © 2022 KSII.
- Convolutional neural network
- Deep learning
- Natural language processing
- Text mining
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications