Abstract
In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.
Original language | English |
---|---|
Title of host publication | EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing |
Subtitle of host publication | Industry Track |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 629-636 |
Number of pages | 8 |
ISBN (Electronic) | 9781952148255 |
Publication status | Published - 2022 |
Event | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 2022 Dec 7 → 2022 Dec 11 |
Publication series
Name | EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track |
---|
Conference
Conference | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
---|---|
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 22/12/7 → 22/12/11 |
Bibliographical note
Funding Information:We would like to express our gratitude to Antoine Lain (University of Edinburgh) for helping authors to collect and unify the format of bench-mark datasets and Mujeen Sung and Minbyul Jeong for providing NER predictions and information for the inference speed experiments. This work is partially funded by National Research Foundation of Korea [NRF-2020R1A2C3010638], the Korea Health Industry Development Institute (KHIDI) 635 [HR20C0021] and ICT Creative Consilience program [IITP-2022-2020-0-01819] funded by Government of Republic of Korea. We would also like to thank Rolando Fernandez for his development work on Kazu.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Information Systems
- Computer Science Applications