Abstract
Biomedical named entity recognition (NER) plays a crucial role in extracting information from documents in biomedical applications. However, many of these applications require NER models to operate at a document level, rather than just a sentence level. This presents a challenge, as the extension from a sentence model to a document model is not always straightforward. Despite the existence of document NER models that are able to make consistent predictions, they still fall short of meeting the expectations of researchers and practitioners in the field. To address this issue, we have undertaken an investigation into the underlying causes of inconsistent predictions. Our research has led us to believe that the use of adjectives and prepositions within entities may be contributing to low label consistency. In this article, we present our method, ConNER, to enhance a label consistency of modifiers such as adjectives and prepositions. By refining the labels of these modifiers, ConNER is able to improve representations of biomedical entities. The effectiveness of our method is demonstrated on four popular biomedical NER datasets. On three datasets, we achieve a higher F1 score than the previous state-of-the-art model. Our method shows its efficacy on two datasets, resulting in 7.5%-8.6% absolute improvements in the F1 score. Our findings suggest that our ConNER method is effective on datasets with intrinsically low label consistency. Through qualitative analysis, we demonstrate how our approach helps the NER model generate more consistent predictions.
Original language | English |
---|---|
Article number | btad361 |
Journal | Bioinformatics |
Volume | 39 |
Issue number | 6 |
DOIs | |
Publication status | Published - 2023 Jun 1 |
Bibliographical note
Publisher Copyright:© 2023 The Author(s).
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics