## Abstract

At the core of any inference procedure, deep neural networks are dot product operations, which are the component that requires the highest computational resources. For instance, deep neural networks, such as VGG-16, require up to 15-G operations in order to perform the dot products present in a single forward pass, which results in significant energy consumption and thus limits their use in resource-limited environments, e.g., on embedded devices or smartphones. One common approach to reduce the complexity of the inference is to prune and quantize the weight matrices of the neural network. Usually, this results in matrices whose entropy values are low, as measured relative to the empirical probability mass distribution of its elements. In order to efficiently exploit such matrices, one usually relies on, inter alia, sparse matrix representations. However, most of these common matrix storage formats make strong statistical assumptions about the distribution of the elements; therefore, cannot efficiently represent the entire set of matrices that exhibit low-entropy statistics (thus, the entire set of compressed neural network weight matrices). In this paper, we address this issue and present new efficient representations for matrices with low-entropy statistics. Alike sparse matrix data structures, these formats exploit the statistical properties of the data in order to reduce the size and execution complexity. Moreover, we show that the proposed data structures can not only be regarded as a generalization of sparse formats but are also more energy and time efficient under practically relevant assumptions. Finally, we test the storage requirements and execution performance of the proposed formats on compressed neural networks and compare them to dense and sparse representations. We experimentally show that we are able to attain up to $\times 42$ compression ratios, $\times 5$ speed ups, and $\times 90$ energy savings when we lossless convert the state-of-the-art networks, such as AlexNet, VGG-16, ResNet152, and DenseNet, into the new data structures and benchmark their respective dot product.

Original language | English |
---|---|

Article number | 8725933 |

Pages (from-to) | 772-785 |

Number of pages | 14 |

Journal | IEEE Transactions on Neural Networks and Learning Systems |

Volume | 31 |

Issue number | 3 |

DOIs | |

Publication status | Published - 2020 Mar |

### Bibliographical note

Funding Information:Manuscript received May 27, 2018; revised December 1, 2018, March 13, 2019 and April 3, 2019; accepted April 5, 2019. Date of publication May 29, 2019; date of current version February 28, 2020. This work was supported in part by the Fraunhofer Society through the MPI-FhG collaboration project “Theory and Practice for Reduced Learning Machines,” in part by the German Ministry for Education through the Berlin Big Data Center under Grant 01IS14013A, in part by the Berlin Center for Machine Learning under Grant 01IS18037I, in part by DFG (EXC 2046/1) under Grant 390685689, and in part by the Information and Communications Technology Planning and Evaluation (IITP) Grant funded by the Korean Government under Grant 2017-0-00451. (Corresponding authors: Klaus-Robert Müller; Wojciech Samek.) S. Wiedemann and W. Samek are with the Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany (e-mail: wojciech.samek@hhi.fraunhofer.de).

Publisher Copyright:

© 2012 IEEE.

## Keywords

- Computationally efficient deep learning
- data structures
- lossless coding
- neural network compression
- sparse matrices

## ASJC Scopus subject areas

- Software
- Computer Science Applications
- Computer Networks and Communications
- Artificial Intelligence