In this article, we describe our system for the CHEMPROT task of the BioCreative VI challenge. Although considerable research on the named entity recognition of genes and drugs has been conducted, there is limited research on extracting relationships between them. Extracting relations between chemical compounds and genes from the literature is an important element in pharmacological and clinical research. The CHEMPROT task of BioCreative VI aims to promote the development of text mining systems that can be used to automatically extract relationships between chemical compounds and genes. We tested three recursive neural network approaches to improve the performance of relation extraction. In the BioCreative VI challenge, we developed a tree-Long Short-Term Memory networks (tree-LSTM) model with several additional features including a position feature and a subtree containment feature, and we also applied an ensemble method. After the challenge, we applied additional pre-processing steps to the tree-LSTM model, and we tested the performance of another recursive neural network model called Stack-augmented Parser Interpreter Neural Network (SPINN). Our tree-LSTM model achieved an F-score of 58.53% in the BioCreative VI challenge. Our tree-LSTM model with additional pre-processing and the SPINN model obtained F-scores of 63.7 and 64.1%, respectively.
Bibliographical noteFunding Information:
This research was supported by the National Research Foundation of Korea (http://www.nrf.re.kr/) grants (NRF-2016M3A9A7916996, NRF-2017M3C4A7065887, NRF-2017R1A2A1A17069645 to JK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
© The Author(s) 2018. Published by Oxford University Press.
ASJC Scopus subject areas
- Information Systems
- General Biochemistry,Genetics and Molecular Biology
- General Agricultural and Biological Sciences