TY - GEN
T1 - Data-Centric and Model-Centric Approaches for Biomedical Question Answering
AU - Yoon, Wonjin
AU - Yoo, Jaehyo
AU - Seo, Sumin
AU - Sung, Mujeen
AU - Jeong, Minbyul
AU - Kim, Gangwoo
AU - Kang, Jaewoo
N1 - Funding Information:
Acknowledgements. We express gratitude towards Dr. Jihye Kim and Dr. Sungjoon Park from Korea University for their invaluable insight into our systems’ output. This research is supported by National Research Foundation of Korea (NRF-2020R1A2C3010638) and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HR20C0021).
Funding Information:
We express gratitude towards Dr. Jihye Kim and Dr. Sungjoon Park from Korea University for their invaluable insight into our systems’ output. This research is supported by National Research Foundation of Korea (NRF-2020R1A2C3010638) and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HR20C0021).
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Biomedical question answering (BioQA) is the process of automated information extraction from the biomedical literature, and as the number of accessible biomedical papers is increasing rapidly, BioQA is attracting more attention. In order to improve the performance of BioQA systems, we designed strategies for the sub-tasks of BioQA and assessed their effectiveness using the BioASQ dataset. We designed data-centric and model-centric strategies based on the potential for improvement for each sub-task. For example, model design for the factoid-type questions has been explored intensely but the potential of increased label consistency has not been investigated (data-centric approach). On the other hand, for list-type questions, we apply the sequence tagging model as it is more natural for the multi-answer (i.e. multi-label) task (model-centric approach). Our experimental results suggest two main points: scarce resources like BioQA datasets can be benefited from data-centric approaches with relatively little effort; and a model design reflecting data characteristics can improve the performance of the system. The scope of this paper is majorly focused on applications of our strategies in the BioASQ 8b dataset and our participating systems in the 9th BioASQ challenges. Our submissions achieve competitive results with top or near top performance in the 9th challenge (Task b - Phase B).
AB - Biomedical question answering (BioQA) is the process of automated information extraction from the biomedical literature, and as the number of accessible biomedical papers is increasing rapidly, BioQA is attracting more attention. In order to improve the performance of BioQA systems, we designed strategies for the sub-tasks of BioQA and assessed their effectiveness using the BioASQ dataset. We designed data-centric and model-centric strategies based on the potential for improvement for each sub-task. For example, model design for the factoid-type questions has been explored intensely but the potential of increased label consistency has not been investigated (data-centric approach). On the other hand, for list-type questions, we apply the sequence tagging model as it is more natural for the multi-answer (i.e. multi-label) task (model-centric approach). Our experimental results suggest two main points: scarce resources like BioQA datasets can be benefited from data-centric approaches with relatively little effort; and a model design reflecting data characteristics can improve the performance of the system. The scope of this paper is majorly focused on applications of our strategies in the BioASQ 8b dataset and our participating systems in the 9th BioASQ challenges. Our submissions achieve competitive results with top or near top performance in the 9th challenge (Task b - Phase B).
KW - BioASQ
KW - Biomedical Natural Language Processing
KW - Biomedical Question Answering
KW - BioNLP
UR - http://www.scopus.com/inward/record.url?scp=85138001727&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-13643-6_16
DO - 10.1007/978-3-031-13643-6_16
M3 - Conference contribution
AN - SCOPUS:85138001727
SN - 9783031136429
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 204
EP - 216
BT - Experimental IR Meets Multilinguality, Multimodality, and Interaction - 13th International Conference of the CLEF Association, CLEF 2022, Proceedings
A2 - Barrón-Cedeño, Alberto
A2 - Da San Martino, Giovanni
A2 - Faggioli, Guglielmo
A2 - Ferro, Nicola
A2 - Degli Esposti, Mirko
A2 - Sebastiani, Fabrizio
A2 - Macdonald, Craig
A2 - Pasi, Gabriella
A2 - Hanbury, Allan
A2 - Potthast, Martin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th International Conference of the Cross-Language Evaluation Forum for European Languages, CLEF 2022
Y2 - 5 September 2022 through 8 September 2022
ER -