Compressing Deep Neural Networks for Mobile Devices

OData support
Dr. Gyires-Tóth Bálint Pál
Department of Telecommunications and Media Informatics

Smart devices with many different sensors, and especially smartphones are spreading in many areas of life, thus generating an increasing amount of data. It is thank to this fact, among others, that there are more and more services available, based on these sensor data (pictures, voice, GPS data or temperature). To process data, besides analytical and rule-based algorithms, machine learning is already used in many areas. Among these methods, one of the most efficient ones nowadays is the deep learning. Looking at the outstanding results of deep neural networks, the question arises, why not to process data locally, by running neural network on the device.

For a long time, limited resources of mobile platforms did not allow to run more complex applications. Therefore, in case of tasks like speech recognition or more complex problems of image recognition, the solutions that are actually available evaluate the data in the cloud, and the user’s device just displays the result. This method is widely used in the applications like Apple Siri and Google Assistant. However, this procedure does not work if the Internet is not available, if the quantity of the data is excessive, if the result is needed in real time, or if the data cannot be uploaded due to privacy constraints. Thus, the question is raised again how to run neural network on smart devices.

The development of mobile processors, the growth of the memory they can use, as well as the recent scientific research results promote the possibility to run offline applications of neural networks on smart phones. In my thesis, I review three compressing algorithms, which all aim at the reduction of the network size, and thus the acceleration of prediction. 3. The knowledge transfer method allows to transform the whole structure of the network into a more compact form by transferring the knowledge of a large, well-trained network into a smaller one. 2. Shrinking neural network makes possible to prune whole neurons from a network, although the current heuristics used by the algorithm is very expensive. 3. Lastly, Clustering the weights of the network allows to compress weight matrices with parallel use of pruning and weight sharing.

Among these three solutions, I elaborated and realized the method based on weight pruning, and I demonstrate it in my thesis, including on the level of the implementation, from the network training through compressing until the application, on an iPhone device.


Please sign in to download the files of this thesis.