Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement

Hyun Joon Park, Wooseok Shin, Jin Sob Kim, Sung Won Han

    Research output: Contribution to journalArticlepeer-review

    5 Citations (Scopus)

    Abstract

    To improve real-time speech enhancement (SE) while maintaining efficiency, researchers have adopted knowledge distillation (KD). However, when the same network type as the real-time SE student model is used as a teacher model, the performance of the teacher model can be unsatisfactory, thereby limiting the effectiveness of KD. To overcome this limitation, we propose cross-network non-causal knowledge distillation (CNNC-Distill). CNNC-Distill enables knowledge transfer between networks of different types, allowing the use of a teacher model with a different network type compared to the real-time SE student model. To maximize the KD effect, a non-real-time SE model unconstrained by causality conditions is adopted as the teacher model. CNNC-Distill transfers the non-causal knowledge of the non-real-time SE teacher model to a real-time SE student model using feature and output distillation. We also introduce a time-domain network, RT-SENet, used as the real-time SE student model. Results on the Valentini dataset show the efficiency of RT-SENet and the significant performance improvement achieved by CNNC-Distill.

    Original languageEnglish
    Pages (from-to)1129-1133
    Number of pages5
    JournalIEEE Signal Processing Letters
    Volume31
    DOIs
    Publication statusPublished - 2024

    Bibliographical note

    Publisher Copyright:
    © 1994-2012 IEEE.

    Keywords

    • Real-time speech enhancement
    • cross-network
    • knowledge distillation
    • non-causal knowledge

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement'. Together they form a unique fingerprint.

    Cite this