Abstract
When training deep networks it is common knowledge that an efficient and well generalizing representation of the problem is formed. In this paper we aim to elucidate what makes the emerging representation successful. We analyze the layer-wise evolution of the representation in a deep network by building a sequence of deeper and deeper kernels that subsume the mapping performed by more and more layers of the deep network and measuring how these increasingly complex kernels fit the learning problem. We observe that deep networks create increasingly better representations of the learning problem and that the structure of the deep network controls how fast the representation of the task is formed layer after layer.
Original language | English |
---|---|
Pages (from-to) | 2563-2581 |
Number of pages | 19 |
Journal | Journal of Machine Learning Research |
Volume | 12 |
Publication status | Published - 2011 Sept |
Externally published | Yes |
Keywords
- Deep networks
- Kernel principal component analysis
- Representations
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence