Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures

Jeong Min Chae, Heung Bum Oh, Sung Eun Choi, Choong Hwan Cha, Myung Hee Kim, Soon Young Jung

Research output: Contribution to journalArticlepeer-review


Background : Since the human genome project was completed in 2003, there have been numerous reports on cancer and related markers. This study was aimed to develop a system to extract automatically information regarding the relationship between cancer and tumor markers from biomedical literatures. Methods : Named entities of tumor markers were recognized by both a dictionary-based method and machine learning technology of the support vector machine. Named entities of cancers were recognized by the MeSH dictionary. Results : Relational and filtering keywords were selected after annotating 160 abstracts from PubMed. Relational information was extracted only when one of the relational keywords was in an appropriate position along the parse tree of a sentence with both tumor marker and disease entities. The performance of the system developed in this study was evaluated with another set of 77 abstracts. With the relational and filtering keyword used in the system, precision was 94.38% and recall was 66.14%, while without the expert knowledge precision was 49.16% and recall was 69.29%. Conclusions : We developed a system that can extract relational information between a tumor and its markers by incorporating expert knowledge into the system. The system exploiting expert knowledge would serve as a reference when developing another information extraction system in various medical fields.

Original languageEnglish
Pages (from-to)79-87
Number of pages9
JournalKorean Journal of Laboratory Medicine
Issue number1
Publication statusPublished - 2008


  • Information extraction
  • Tumor
  • Tumor marker

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Biochemistry, medical


Dive into the research topics of 'Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures'. Together they form a unique fingerprint.

Cite this