TY - JOUR
T1 - BRONCO
T2 - Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
AU - Lee, Kyubum
AU - Lee, Sunwon
AU - Park, Sungjoon
AU - Kim, Sunkyu
AU - Kim, Suhkyung
AU - Choi, Kwanghun
AU - Tan, Aik Choon
AU - Kang, Jaewoo
N1 - Publisher Copyright:
© The Author(s) 2016.
PY - 2016
Y1 - 2016
N2 - Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.
AB - Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.
UR - http://www.scopus.com/inward/record.url?scp=84964894530&partnerID=8YFLogxK
U2 - 10.1093/database/baw043
DO - 10.1093/database/baw043
M3 - Article
C2 - 27074804
AN - SCOPUS:84964894530
SN - 1758-0463
VL - 2016
JO - Database : the journal of biological databases and curation
JF - Database : the journal of biological databases and curation
ER -