Abstract
Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models. In this paper, we aim to improve the quality of each phrase embedding by augmenting it with a contextualized sparse representation (SPARC). Unlike previous sparse vectors that are term-frequency-based (e.g., tf-idf) or directly learned (only few thousand dimensions), we leverage rectified self-attention to indirectly learn sparse vectors in n-gram vocabulary space. By augmenting the previous phrase retrieval model (Seo et al., 2019) with SPARC, we show 4%+ improvement in CuratedTREC and SQuAD-Open. Our CuratedTREC score is even better than the best known retrieve & read model with at least 45x faster inference speed.
Original language | English |
---|---|
Title of host publication | ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 912-919 |
Number of pages | 8 |
ISBN (Electronic) | 9781952148255 |
DOIs | |
Publication status | Published - 2020 |
Event | 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States Duration: 2020 Jul 5 → 2020 Jul 10 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
ISSN (Print) | 0736-587X |
Conference
Conference | 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 20/7/5 → 20/7/10 |
Bibliographical note
Publisher Copyright:© 2020 Association for Computational Linguistics
ASJC Scopus subject areas
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics