Since the CPU’s clock frequency cannot be increased after a certain point due to electronical reasons, nowadays two main directions are researched. One direction is to make things run in parallel, the other one is to remove unnecessary idle times by making the number of cache misses smaller.
In the document’s first part I describe different CPU cache properties, with focus on how nowadays desktop CPUs work. A few basic cache structure and replacement algorithms are presented as a base, followed by the detailed explanation of some advanced cache structures and replacement algorithms found in publications. Measureable values are also presented, which can be used to compare the effectiveness of the different cache implementations. Finally two methods are described, which can be used to visualize software’s, or memory access pattern’s locality.
In the second part of the work the Intel-developed Pin framework is presented. I describe a tool created using the Pin framework, capable of tracing memory addresses of a binary software. Simulations, written for some of the previously presented cache structures and replace algorithms, are also presented in this part.
In the final part of this work, using the implemented tool, a few memory traces are created, and later those trace files are used in cache simulations, and locality visualizations. The document ends with the assessment of traced software’s locality, and the effectiveness of the implemented cache solutions.