Dual stage learning based dynamic time-frequency mask generation for audio event classification

Donghyeon Kim, Jaihyun Park, David K. Han, Hanseok Ko

    Research output: Contribution to journalConference articlepeer-review

    7 Citations (Scopus)

    Abstract

    Audio based event recognition becomes quite challenging in real world noisy environments. To alleviate the noise issue, time-frequency mask based feature enhancement methods have been proposed. While these methods with fixed filter settings have been shown to be effective in familiar noise backgrounds, they become brittle when exposed to unexpected noise. To address the unknown noise problem, we develop an approach based on dynamic filter generation learning. In particular, we propose a dual stage dynamic filter generator networks that can be trained to generate a time-frequency mask specifically created for each input audio. Two alternative approaches of training the mask generator network are developed for feature enhancements in high noise environments. Our proposed method shows improved performance and robustness in both clean and unseen noise environments.

    Original languageEnglish
    Pages (from-to)836-840
    Number of pages5
    JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    Volume2020-October
    DOIs
    Publication statusPublished - 2020
    Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
    Duration: 2020 Oct 252020 Oct 29

    Bibliographical note

    Funding Information:
    This work was supported by the Korea Environmental Industry & Technology Institute (KEITI) through the Public Technology Program based on environmental policy funded by the Korean Ministry of Environment (MOE; 2017000210001), and the contribution of David Han was supported by the US Army Research Laboratory.

    Publisher Copyright:
    Copyright © 2020 ISCA

    Keywords

    • Audio recognition
    • Dual stage
    • Dynamic filter network
    • Feature enhancement

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    Fingerprint

    Dive into the research topics of 'Dual stage learning based dynamic time-frequency mask generation for audio event classification'. Together they form a unique fingerprint.

    Cite this