In my thesis I show the steps and difficulties to build an FPGA based compute-accelerator connected by PCIe bus. The PCIe bus is the most commonly used high-speed bus nowadays in personal computers, servers and high-end embedded SoCs.
The PCIe is a packet based communication thus the devices are connected by serial high-speed differential point-to-point signals. The clock and the data are encoded in the same waveform, so the exact transmitter clock can be reconstructed.
I have studied the three available PCIe IP from Xilinx, and I used the AXI Memory Mapped To PCI Express IP core to implement the connection between the internal AXI and the PCIe bus. There is a DDR controller in the FPGA, so data can be copied to this RAM and it can be accessed with higher bandwidth from the accelerator core.
I have implemented a Gaussian-blur (based on 2D convolution) IP core with HLS to demonstrate the operation of the whole system. The HLS can be used to synthesize hardware from algorithm implemented in the C/C++ programming language.
I have made a Linux kernel driver for this device. The accelerator can be used by user space application as a character device with the standard (in Unix-like systems) open/read/write/fcntl file-operations.