Abstract
Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics, ACL 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 2184-2195 |
Number of pages | 12 |
ISBN (Electronic) | 9781959429623 |
DOIs | |
Publication status | Published - 2023 |
Event | Findings of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada Duration: 2023 Jul 9 → 2023 Jul 14 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
ISSN (Print) | 0736-587X |
Conference
Conference | Findings of the Association for Computational Linguistics, ACL 2023 |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 23/7/9 → 23/7/14 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics