Orthogonal gradient penalty for fast training of Wasserstein GaN based multi-task autoencoder toward robust speech recognition

Chao Yuan Kao, Sangwook Park, Alzahra Badi, David K. Han, Hanseok Ko

    Research output: Contribution to journalArticlepeer-review

    3 Citations (Scopus)

    Abstract

    Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.

    Original languageEnglish
    Pages (from-to)1195-1198
    Number of pages4
    JournalIEICE Transactions on Information and Systems
    VolumeE103D
    Issue number5
    DOIs
    Publication statusPublished - 2020 May

    Bibliographical note

    Funding Information:
    The authors of Korea University are funded by the Ministry

    Publisher Copyright:
    © 2020 The Institute of Electronics, Information and Communication Engineers

    Keywords

    • Deep learning
    • Generative adversarial networks
    • Robust speech recognition
    • Speech enhancement

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Vision and Pattern Recognition
    • Electrical and Electronic Engineering
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Orthogonal gradient penalty for fast training of Wasserstein GaN based multi-task autoencoder toward robust speech recognition'. Together they form a unique fingerprint.

    Cite this