Can Language Models be Biomedical Knowledge Bases?

Mujeen Sung, Jinhyuk Lee, Sean S. Yi, Minji Jeon, Sungdong Kim, Jaewoo Kang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    53 Citations (Scopus)

    Abstract

    Pre-trained language models (LMs) have become ubiquitous in solving various natural language processing (NLP) tasks. There has been increasing interest in what knowledge these LMs contain and how we can extract that knowledge, treating LMs as knowledge bases (KBs). While there has been much work on probing LMs in the general domain, there has been little attention to whether these powerful LMs can be used as domain-specific KBs. To this end, we create the BIOLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs. We find that biomedical LMs with recently proposed probing methods can achieve up to 18.51% Acc@5 on retrieving biomedical knowledge. Although this seems promising given the task difficulty, our detailed analyses reveal that most predictions are highly correlated with prompt templates without any subjects, hence producing similar results on each relation and hindering their capabilities to be used as domain-specific KBs. We hope that BIOLAMA can serve as a challenging benchmark for biomedical factual probing.

    Original languageEnglish
    Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
    PublisherAssociation for Computational Linguistics (ACL)
    Pages4723-4734
    Number of pages12
    ISBN (Electronic)9781955917094
    Publication statusPublished - 2021
    Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
    Duration: 2021 Nov 72021 Nov 11

    Publication series

    NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

    Conference

    Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
    Country/TerritoryDominican Republic
    CityVirtual, Punta Cana
    Period21/11/721/11/11

    Bibliographical note

    Funding Information:
    This work was supported in part by the ICT Creative Consilience program (IITP-2021-0-01819) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation), National Research Foundation of Korea (NRF-2020R1A2C3010638, NRF-2014M3C9A3063541), and Hyundai Motor Chung Mong-Koo Foundation. We thank the anonymous reviewers for their insightful comments.

    Publisher Copyright:
    © 2021 Association for Computational Linguistics

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Computer Science Applications
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Can Language Models be Biomedical Knowledge Bases?'. Together they form a unique fingerprint.

    Cite this