Biomedical entity representations with synonym marginalization

  • Mujeen Sung
  • , Hwisang Jeon
  • , Jinhyuk Lee*
  • , Jaewoo Kang*
  • *Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Biomedical named entities often play important roles in many biomedical text mining tools. However, due to the incompleteness of provided synonyms and numerous variations in their surface forms, normalization of biomedical entities is very challenging. In this paper, we focus on learning representations of biomedical entities solely based on the synonyms of entities. To learn from the incomplete synonyms, we use a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates. Our model-based candidates are iteratively updated to contain more difficult negative samples as our model evolves. In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates. On four biomedical entity normalization datasets having three different entity types (disease, chemical, adverse reaction), our model BIOSYN consistently outperforms previous state-of-the-art models almost reaching the upper bound on each dataset.

    Original languageEnglish
    Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages3641-3650
    Number of pages10
    ISBN (Electronic)9781952148255
    Publication statusPublished - 2020
    Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
    Duration: 2020 Jul 52020 Jul 10

    Publication series

    NameProceedings of the Annual Meeting of the Association for Computational Linguistics
    ISSN (Print)0736-587X

    Conference

    Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
    Country/TerritoryUnited States
    CityVirtual, Online
    Period20/7/520/7/10

    Bibliographical note

    Funding Information:
    This research was supported by National Research Foundation of Korea (NRF-2016M3A9A7916996, NRF-2014M3C9A3063541). We thank the members of Korea University, and the anonymous reviewers for their insightful comments.

    Publisher Copyright:
    © 2020 Association for Computational Linguistics

    ASJC Scopus subject areas

    • Computer Science Applications
    • Linguistics and Language
    • Language and Linguistics

    Fingerprint

    Dive into the research topics of 'Biomedical entity representations with synonym marginalization'. Together they form a unique fingerprint.

    Cite this