Convolutional neural networks require especially high computational complexity. In most of the cases the intermediate results can be computed parallel. However, it is still a challenging problem to design a state-of-art processing unit due to the limited resource capacity. As CNNs are appearing in embedded systems, energy efficiency is becoming a more important constraint in addition to runtime. The usage of embedded coefficients in FPGA implementation provides a useful opportunity to design the network taking into account the a priori known coefficients in order to improve the performance. The biggest advantage of bitserial arithmetic compared to bitparallel implementation is the significant reduction of resource usage. Furthermore, exploiting this reduced resource usage it is possible to increase the level of parallelism of required operations and decrease the number of memory accesses that are used for storing and loading partial results. My BSc thesis describes the bitserial CNN processor implementations in an FPGA environment and the realization of an image recognition application.