TY - GEN
T1 - Entropy-Constrained Training of Deep Neural Networks
AU - Wiedemann, Simon
AU - Marban, Arturo
AU - Muller, Klaus Robert
AU - Samek, Wojciech
N1 - Funding Information:
This work was supported by the Fraunhofer Society through the MPI-FhG collaboration project “Theory & Practice for Reduced Learning Machines”. This research was also supported by the German Ministry for Education and Research as Berlin Big Data Centre (01IS14013A) and Berlin Center for Machine Learning (01IS18037I). Partial funding by DFG is acknowledged (EXC 2046/1, project-ID: 390685689). This work was also supported by the Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (No. 2017-0-00451).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Motivated by the Minimum Description Length (MDL) principle, we first derive an expression for the entropy of a neural network which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the currently proposed compression techniques in the literature, in that pruning or reducing the cardinality of the weight elements can be seen as special cases of entropy reduction methods. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient-based optimization techniques. Finally, we show that we can reach compression results, which are competitive with those obtained using state-of-the-art techniques, on different network architectures and data sets, e.g. achieving×71 compression gains on a VGG-like architecture.
AB - Motivated by the Minimum Description Length (MDL) principle, we first derive an expression for the entropy of a neural network which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the currently proposed compression techniques in the literature, in that pruning or reducing the cardinality of the weight elements can be seen as special cases of entropy reduction methods. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient-based optimization techniques. Finally, we show that we can reach compression results, which are competitive with those obtained using state-of-the-art techniques, on different network architectures and data sets, e.g. achieving×71 compression gains on a VGG-like architecture.
KW - Neural network compression
KW - cardinality reduction
KW - entropy minimization
KW - pruning
UR - http://www.scopus.com/inward/record.url?scp=85073237810&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2019.8852119
DO - 10.1109/IJCNN.2019.8852119
M3 - Conference contribution
AN - SCOPUS:85073237810
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2019 International Joint Conference on Neural Networks, IJCNN 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Joint Conference on Neural Networks, IJCNN 2019
Y2 - 14 July 2019 through 19 July 2019
ER -