Effective acoustic model clustering via decision-tree with supervised learning

Junho Park, Hanseok Ko

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.

    Original languageEnglish
    Pages (from-to)1-13
    Number of pages13
    JournalSpeech Communication
    Volume46
    Issue number1
    DOIs
    Publication statusPublished - 2005 May

    Keywords

    • Acoustic modeling
    • Decision-tree
    • Large vocabulary continuous speech recognition

    ASJC Scopus subject areas

    • Software
    • Modelling and Simulation
    • Communication
    • Language and Linguistics
    • Linguistics and Language
    • Computer Vision and Pattern Recognition
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Effective acoustic model clustering via decision-tree with supervised learning'. Together they form a unique fingerprint.

    Cite this