Biomedical relation extraction with knowledge base-refined weak supervision

Wonjin Yoon, Sean Yi, Richard Jackson, Hyunjae Kim, Sunkyu Kim, Jaewoo Kang

Research output: Contribution to journalArticlepeer-review


Biomedical relation extraction (BioRE) is the task of automatically extracting and classifying relations between two biomedical entities in biomedical literature. Recent advances in BioRE research have largely been powered by supervised learning and large language models (LLMs). However, training of LLMs for BioRE with supervised learning requires human-annotated data, and the annotation process often accompanies challenging and expensive work. As a result, the quantity and coverage of annotated data are limiting factors for BioRE systems. In this paper, we present our system for the BioCreative VII challenge - DrugProt track, a BioRE system that leverages a language model structure and weak supervision. Our system is trained on weakly labelled data and then fine-tuned using human-labelled data. To create the weakly labelled dataset, we combined two approaches. First, we trained a model on the original dataset to predict labels on external literature, which will become a model-labelled dataset. Then, we refined the model-labelled dataset using an external knowledge base. Based on our experiment, our approach using refined weak supervision showed significant performance gain over the model trained using standard human-labelled datasets. Our final model showed outstanding performance at the BioCreative VII challenge, achieving 3rd place (this paper focuses on our participating system in the BioCreative VII challenge). Database URL:

Original languageEnglish
Article numberbaad054
Publication statusPublished - 2023

Bibliographical note

Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press.

ASJC Scopus subject areas

  • Information Systems
  • General Biochemistry,Genetics and Molecular Biology
  • General Agricultural and Biological Sciences


Dive into the research topics of 'Biomedical relation extraction with knowledge base-refined weak supervision'. Together they form a unique fingerprint.

Cite this