Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

Wonjin Yoon, Richard Jackson, Elliot Ford, Vladimir Poroshin, Jaewoo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.

Original languageEnglish
Title of host publicationEMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationIndustry Track
PublisherAssociation for Computational Linguistics (ACL)
Pages629-636
Number of pages8
ISBN (Electronic)9781952148255
Publication statusPublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 2022 Dec 72022 Dec 11

Publication series

NameEMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period22/12/722/12/11

Bibliographical note

Funding Information:
We would like to express our gratitude to Antoine Lain (University of Edinburgh) for helping authors to collect and unify the format of bench-mark datasets and Mujeen Sung and Minbyul Jeong for providing NER predictions and information for the inference speed experiments. This work is partially funded by National Research Foundation of Korea [NRF-2020R1A2C3010638], the Korea Health Industry Development Institute (KHIDI) 635 [HR20C0021] and ICT Creative Consilience program [IITP-2022-2020-0-01819] funded by Government of Republic of Korea. We would also like to thank Rolando Fernandez for his development work on Kazu.

Publisher Copyright:
© 2022 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework'. Together they form a unique fingerprint.

Cite this