Simple Questions Generate Named Entity Recognition Datasets

Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

Research output: Contribution to conferencePaperpeer-review

Abstract

Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., “Which disease?”). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance. The code and datasets are available at https://github.com/dmis-lab/GeNER.

Original languageEnglish
Pages6220-6236
Number of pages17
Publication statusPublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 2022 Dec 72022 Dec 11

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period22/12/722/12/11

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Simple Questions Generate Named Entity Recognition Datasets'. Together they form a unique fingerprint.

Cite this