Contextualized sparse representations for real-time open-domain question answering

Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    23 Citations (Scopus)

    Abstract

    Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models. In this paper, we aim to improve the quality of each phrase embedding by augmenting it with a contextualized sparse representation (SPARC). Unlike previous sparse vectors that are term-frequency-based (e.g., tf-idf) or directly learned (only few thousand dimensions), we leverage rectified self-attention to indirectly learn sparse vectors in n-gram vocabulary space. By augmenting the previous phrase retrieval model (Seo et al., 2019) with SPARC, we show 4%+ improvement in CuratedTREC and SQuAD-Open. Our CuratedTREC score is even better than the best known retrieve & read model with at least 45x faster inference speed.

    Original languageEnglish
    Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages912-919
    Number of pages8
    ISBN (Electronic)9781952148255
    DOIs
    Publication statusPublished - 2020
    Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
    Duration: 2020 Jul 52020 Jul 10

    Publication series

    NameProceedings of the Annual Meeting of the Association for Computational Linguistics
    ISSN (Print)0736-587X

    Conference

    Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
    Country/TerritoryUnited States
    CityVirtual, Online
    Period20/7/520/7/10

    Bibliographical note

    Publisher Copyright:
    © 2020 Association for Computational Linguistics

    ASJC Scopus subject areas

    • Computer Science Applications
    • Linguistics and Language
    • Language and Linguistics

    Fingerprint

    Dive into the research topics of 'Contextualized sparse representations for real-time open-domain question answering'. Together they form a unique fingerprint.

    Cite this