Estimation of Parameters of Webpage Download by Machine Learning

OData support
Dr. Molnár Sándor
Department of Telecommunications and Media Informatics

Due to the overload of today’s networks, it is more and more important for the internet service providers to analyse the quality parameters of user experience. Since the measurement of subjective user experience is too time-consuming, they rely on quantitative data estimation based on traffic patterns.

This method can be used easily because of the operation of the HTTP protocol, where each object is downloaded in a separate flow without using encryption. Due to the latest trends in browsers new protocols like SPDY and QUIC appeared. The common feature of these protocols is that the web objects are downloaded in single multiplexed flow over an encrypted channel, making it difficult to estimate the page load time based on traffic patterns.

Moreover, conventional analysis methods cannot keep up with the ever-changing web traffic thus researchers turned to Machine Learning (ML) methods which was proven to work successfully in many areas. The advantage of ML methods is by using them we have the possibility to process big data which would be almost impossible using traditional methods. Recent literature showed that ML algorithms are suitable for measuring Quality of Experience (QoE) of online video watching and also for network traffic identification. On the basis of these researches, I believe that ML algorithms can be used successfully for predicting the page load times as well.

I developed a framework in which websites download parameters can be measured by automatically browsing with Google Chrome while the network traffic is captured. I created my own database using the website download parameters and the associated traffic patterns and then I analysed them with ML algorithms. The trained algorithm is suitable to estimate whether the user is satisfied with the loading speed of the website based on the first few packets in the traffic stream. The other important advantage of the created framework is that we can obtain the necessary estimation earlier than the loading of website is completed. We can achieve this because we don’t need to analyse all the packet of a certain web flow.

Using the created framework, service providers can obtain information about network parameters without user interaction, thus they can get information earlier about the potential problems and they can use this information to optimize their network.


Please sign in to download the files of this thesis.