Although P2P file sharing is still the most popular P2P application among users, P2P TV streaming applications will be widely spread in the near future, as they allow to serve millions of concurrent viewers, while the costs can remain low. On the other hand, the traffic generated by P2P applications cause serious problems to the ISPs (Internet Service Provider), as it can overload the resources and capacities of backbone and access networks. For efficient network management proper traffic identification methods are indispensable.
My thesis presents a system based on flow-level methods, which can identify traffic generated by P2P TV streaming applications correctly. After studying specific publications I implemented and modified the heuristics of Magyar Telekom for filtering P2P traffic, and the Abacus algorithm designed to identify P2P TV applications. The Abacus algorithm – using flow-level traces (Netflow) in my case – identifies the P2P TV traffic by simply counting the received packets and bytes. For each host the algorithm creates signatures (vectors); it sorts the neighbours to different bins according to the amount of received packets and bytes during a time interval. The signatures represent the behavior of applications. The classification of the unknown traffic is made by SVM (Support Vector Machines), which is a two-phase process. The first phase is the training phase, during which the SVM can learn the behavior of each application, so a model is created. In the second phase this model can be applied to classify the unknown traffic. The most significant innovation comparing to the original case is that the traces collected for training differ in space and time from the traces collected for classification. In addition to my best knowledge this is the first time to investigate TCP traffic besides UDP traffic searching for P2P TV applications.
During my work I could test the performance of the implemented system with real network traffic of Magyar Telekom. I ran four P2P TV applications, namely SopCast, TVAnts, TVUPlayer, and PPLive in a controlled way on two PCs connected to the investigated network, then I analyzed the aggregated traffic. According to the results I managed to identify 97-99 percent of the downloaded traffic of three applications (SopCast, TVAnts, TVUPlayer) in case of UDP. In case of TCP traffic, I identified more than 70 percent of downloaded traffic of TVAnts. However, as the false positive rate remained high, one of the future works is to reduce this rate.