Data-Centric and Model-Centric Approaches for Biomedical Question Answering

Wonjin Yoon, Jaehyo Yoo, Sumin Seo, Mujeen Sung, Minbyul Jeong, Gangwoo Kim, Jaewoo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


Biomedical question answering (BioQA) is the process of automated information extraction from the biomedical literature, and as the number of accessible biomedical papers is increasing rapidly, BioQA is attracting more attention. In order to improve the performance of BioQA systems, we designed strategies for the sub-tasks of BioQA and assessed their effectiveness using the BioASQ dataset. We designed data-centric and model-centric strategies based on the potential for improvement for each sub-task. For example, model design for the factoid-type questions has been explored intensely but the potential of increased label consistency has not been investigated (data-centric approach). On the other hand, for list-type questions, we apply the sequence tagging model as it is more natural for the multi-answer (i.e. multi-label) task (model-centric approach). Our experimental results suggest two main points: scarce resources like BioQA datasets can be benefited from data-centric approaches with relatively little effort; and a model design reflecting data characteristics can improve the performance of the system. The scope of this paper is majorly focused on applications of our strategies in the BioASQ 8b dataset and our participating systems in the 9th BioASQ challenges. Our submissions achieve competitive results with top or near top performance in the 9th challenge (Task b - Phase B).

Original languageEnglish
Title of host publicationExperimental IR Meets Multilinguality, Multimodality, and Interaction - 13th International Conference of the CLEF Association, CLEF 2022, Proceedings
EditorsAlberto Barrón-Cedeño, Giovanni Da San Martino, Guglielmo Faggioli, Nicola Ferro, Mirko Degli Esposti, Fabrizio Sebastiani, Craig Macdonald, Gabriella Pasi, Allan Hanbury, Martin Potthast
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages13
ISBN (Print)9783031136429
Publication statusPublished - 2022
Event13th International Conference of the Cross-Language Evaluation Forum for European Languages, CLEF 2022 - Bologna, Italy
Duration: 2022 Sept 52022 Sept 8

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13390 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference13th International Conference of the Cross-Language Evaluation Forum for European Languages, CLEF 2022


  • BioASQ
  • Biomedical Natural Language Processing
  • Biomedical Question Answering
  • BioNLP

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Data-Centric and Model-Centric Approaches for Biomedical Question Answering'. Together they form a unique fingerprint.

Cite this