Abstract
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., “Which disease?”). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance. The code and datasets are available at https://github.com/dmis-lab/GeNER.
Original language | English |
---|---|
Pages | 6220-6236 |
Number of pages | 17 |
Publication status | Published - 2022 |
Event | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 2022 Dec 7 → 2022 Dec 11 |
Conference
Conference | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
---|---|
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 22/12/7 → 22/12/11 |
Bibliographical note
Funding Information:We thank Jungsoo Park, Gyuwan Kim, Mujeen Sung, Sungdong Kim, Yonghwa Choi, Won-jin Yoon, and Gangwoo Kim for the helpful feedback. This research was supported by (1) National Research Foundation of Korea (NRF-2020R1A2C3010638), (2) the MSIT (Ministry of Science and ICT), Korea, under the ICT Creative Consilience program (IITP-2021-2020-0-01819) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation), and (3) a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HR20C0021).
Publisher Copyright:
© 2022 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems