Nowadays the applicability of artificial inteligence and neural network based systems on IoT devices are spreading even wider. Processing images from the environment in real time opens up possibilities for example implementing object recognition with high speed heterogeneous systems. CNNs can be efficiently used for image racognition which can be implemented on FPGAs using either hardware multipliers or special arithmetic function units. Such systems usually have high computational, memory and bandwidth requirements.
In my thesis I present an implementation of such a system, which uses a pre-trained network to detect hand written digits on a continuous video stream. The hardware elements of the system were created with Xilinxs high level synthesis language, HLS and the layers were designed such a way that they would be easy to adopt to any convolutional neural network. The ratio between performance and resource usage can also be configured so the network can be optimized for the task. The example network has been integrated into a system with an HDMI subsystem so it can be used to recognize images from an arbitrary source. The system uses the PynQ libraries from Xilinx, where the FPGA interfaces a Python software environment, so creating custom software image preprocess algorithms are quite simple with pre-existing libraries.The system has been tested for performance, accuracy and energy use and has been compared to the original trained network running on a CPU and has been also compared to other FPGA implementation of the same network.