The evolution of the semiconductor manufacturing makes it possible to integrate more and more function to a single silicon chip. Complex circuits, which contain all functionality on a silicon die are called System on Chip (SoC). The submodules of the SoCs, which are designed for a well-defined task, are called Application-Specific Integrated Circuits (ASIC). ASICs are energy efficient and have great performance, but the fix functionality makes it difficult to reuse these circuits in a new SoC. The Application-Specific Instruction Set Processors (ASIPs) offer a trade off between flexibility and computational performance. The instruction set and architecture of these processors are optimized for a specific application domain. The programmable ASIPs have the flexibility and reusability of a microprocessor, but their computational performance is comparable to ASICs'.
The purpose of my thesis work was to design an ASIP optimised for cluster analysis. Cluster analysis checks data ''similarity'' with an objective proximity metric and create groups. The process can be accelerated with faster metric evaluation because these calculations demand a lot of computational power.
I applied the design flow of digital circuits, and during the first semester of the thesis work, based on an Instruction-set Architecture (ISA) specification I designed a high level model (functional simulator) of an ASIP in C++. This model captures the behavior of the programmer's view (resources visible for programmers such as register arrays, flags, registers). The functional simulator can execute assembly programs using an assembler I designed.
During the second semester of the thesis work, I implemented an accelerator circuit in VHDL and framed it into the processor's Register Transfer Level (RTL) description. This circuit calculates Minkowski distance based on parameters which are stored in a register array. I performed testbench based simulation on the seperate circuit and the combined ASIP system with different datasets. After I have eliminated the bugs, I synthesised the accelerator and the ASIP and checked the behavior of the synthesized circuit with timing simulations. I also implemented the algorithm that calculates the Minkowski distance in assembly language, and compared the performance of the processor and the accelerator circuit.