The traffic of computer networks travels through networks in the form of data packets, which have specific headers that contain the source and destination address. Based on the networking topology, backbone devices create routing tables to accomplish traffic forwarding of each packet from its source to their destination. Inside each and every device that participates in packet forwarding, this routing table contains all the necessary information, from which the source-destination mapping can be made. The size of this Forwarding Information Base (FIB) exceeds the order of 500 thousand entries in today’s core network routers. The FIB is being accessed upon every forwarded IP packet, which could require 20 million lookups every second. This structure is widely used by ISP’s backbone network devices and most software routers, particularly the ones that use Linux as their Operating Systems.
Fast and efficient data structure is not easy to find for this task; literature dated back to 1968 was the first, which addressed this. Ever since several proposals and their implementation have been made; such as tree-like Leaf-Pushed Binary Tries, Hashing Schemes, Path- and Level-compression algorithms and Trie Folding optimization. In this Master’s thesis I got acquainted with their construction and their properties. The introduction of these data structures are followed by the in depth presentation of the one currently implemented in the networking of Linux Kernel.
I have defined and implemented generic serialization methods, which are capable to store one or many different types of arbitrarily chosen FIB data structures in a compressed form. Searching in these compressed representations is viable using their lookup functions. Using this approach I also evaluated and compared these data structures to each other and to the original one in the Linux implementation. The measurements were performed using a powerful software router running Linux Operating System and I examined their storage and processing efficiency by measuring memory usage, CPU instruction cycles and CPU cache-miss rate.
Based on the performance evaluation, I would recommend these algorithms and serialization formats, especially the Multibit Trie Folding for further use in Linux. The performance gain is already prominent and it could be developed to perform even better.