Asymptotic statistical theory of overtraining and cross-validation

Shun Ichi Amari, Noboru Murata, Klaus Robert Müller, Michael Finke, Howard Hua Yang

    Research output: Contribution to journalArticlepeer-review

    286 Citations (Scopus)

    Abstract

    A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

    Original languageEnglish
    Pages (from-to)985-996
    Number of pages12
    JournalIEEE Transactions on Neural Networks
    Volume8
    Issue number5
    DOIs
    Publication statusPublished - 1997

    Bibliographical note

    Funding Information:
    Manuscript received September 11, 1995; revised October 21, 1996 and May 10, 1997. K.-R. Müller was supported in part by the EC S & T fellowship (FTJ 3-004). This work was supported by the National Institutes of Health (P41RRO 5969) and CNCPST Paris (96JR063).

    Keywords

    • Asymptotic analysis
    • Cross-validation
    • Early stopping
    • Generalization
    • Overtraining
    • Stochastic neural networks

    ASJC Scopus subject areas

    • Software
    • Computer Science Applications
    • Computer Networks and Communications
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Asymptotic statistical theory of overtraining and cross-validation'. Together they form a unique fingerprint.

    Cite this