AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection

Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han

Research output: Contribution to journalConference articlepeer-review

Abstract

Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based multiple SELD (AD-YOLO), which is an adaptation of the "You Look Only Once"algorithm for SELD. The AD-YOLO format allows the model to learn sound occurrences location-sensitively by assigning class responsibility to DOA predictions. Hence, the format enables the model to handle the polyphony problem, regardless of the number of sound overlaps. We evaluated AD-YOLO on DCASE 2020-2022 challenge Task 3 datasets using four SELD objective metrics. The experimental results show that AD-YOLO achieved outstanding performance overall and also accomplished robustness in class-homogeneous polyphony environments.

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • angular distance
  • polyphony environment
  • sound event localization and detection
  • you only look once

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection'. Together they form a unique fingerprint.

Cite this