“Why do I feel offended?’ Korean Dataset for Offensive Language Identification

San Hee Park, Kang Min Kim, O. Joun Lee, Youjin Kang, Jaewon Lee, Su Min Lee, Sang Keun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)

    Abstract

    Offensive content is an unavoidable issue on social media. Most existing offensive language identification methods rely on the compilation of labeled datasets. However, existing methods rarely consider low-resource languages that have relatively less data available for training (e.g., Korean). To address these issues, we construct a novel KOrean Dataset for Offensive Language Identification (KODOLI). KODOLI comprises more fine-grained offensiveness categories (i.e., not offensive, likely offensive, and offensive) than existing ones. A likely offensive language refers to texts with implicit offensiveness or abusive language without offensive intentions. In addition, we propose two auxiliary tasks to help identify offensive languages: abusive language detection and sentiment analysis. We provide experimental results for baselines on KODOLI and observe that pre-trained language models suffer from identifying "LIKELY" offensive statements. Quantitative results and qualitative analysis demonstrate that jointly learning offensive language, abusive language and sentiment information improves the performance of offensive language identification.

    Original languageEnglish
    Title of host publicationEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023
    PublisherAssociation for Computational Linguistics (ACL)
    Pages1112-1123
    Number of pages12
    ISBN (Electronic)9781959429470
    Publication statusPublished - 2023
    Event17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 - Dubrovnik, Croatia
    Duration: 2023 May 22023 May 6

    Publication series

    NameEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023

    Conference

    Conference17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023
    Country/TerritoryCroatia
    CityDubrovnik
    Period23/5/223/5/6

    Bibliographical note

    Funding Information:
    This work was supported by the Basic Research Program through the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C3010430), the NRF grant funded by the Korea government (MSIT) (No. 2022R1C1C1010317), and Institute of Information communications Technology Planning Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program (Korea University)).

    Publisher Copyright:
    © 2023 Association for Computational Linguistics.

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Software
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of '“Why do I feel offended?’ Korean Dataset for Offensive Language Identification'. Together they form a unique fingerprint.

    Cite this