For tasks that require high computing performance, devices with heterogeneous structure offer sufficient alternative against (multi-core) processor based systems. In the FPGA implementations of such hardware accelerators the bandwidth of the communication with the central processor is just as crucial as the computing performance itself.
My goal was to design and implement an FPGA based system, which is capable of processing and moving large amount of data (from and to a host computer). The system consists of two partitions; the PCI Express communication interface, which is responsible for the high-speed data transfer, and a data processing unit (e.g. image processing).
At first I introduce the PCI Express interface, with the relevant sections from the specification. Then I show the Bus Mastering DMA reference design that I used as a starting point. After defining the modifications and developments that are needed, I present the overview of the FPGA system and a detailed view of its components.
My design includes a DDR3 memory-controller module and a data processing unit designed and implemented for demonstration purposes. This unit is connected to the PCI Express communication module via standard AXI4 interface.
At the end I summarize my work and show a few opportunities for further development of the project.