Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

Wonjin Yoon, Richard Jackson, Elliot Ford, Vladimir Poroshin, Jaewoo Kang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.

    Original languageEnglish
    Title of host publicationEMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
    Subtitle of host publicationIndustry Track
    PublisherAssociation for Computational Linguistics (ACL)
    Pages629-636
    Number of pages8
    ISBN (Electronic)9781952148255
    Publication statusPublished - 2022
    Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
    Duration: 2022 Dec 72022 Dec 11

    Publication series

    NameEMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

    Conference

    Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
    Country/TerritoryUnited Arab Emirates
    CityAbu Dhabi
    Period22/12/722/12/11

    Bibliographical note

    Funding Information:
    We would like to express our gratitude to Antoine Lain (University of Edinburgh) for helping authors to collect and unify the format of bench-mark datasets and Mujeen Sung and Minbyul Jeong for providing NER predictions and information for the inference speed experiments. This work is partially funded by National Research Foundation of Korea [NRF-2020R1A2C3010638], the Korea Health Industry Development Institute (KHIDI) 635 [HR20C0021] and ICT Creative Consilience program [IITP-2022-2020-0-01819] funded by Government of Republic of Korea. We would also like to thank Rolando Fernandez for his development work on Kazu.

    Publisher Copyright:
    © 2022 Association for Computational Linguistics.

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Information Systems
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework'. Together they form a unique fingerprint.

    Cite this