Data-mining of a telecommunication provider's customers in SAS environment

Gáspár Csaba
Department of Telecommunications and Media Informatics

Nowadays in almost every home you can encounter some kind of telecommunication or content service, which are bound to a subscriber or provider. The used services generate traffic on the given network, whose parameters are being registered on the provider side. The great amount of detailed data generated this way is used partly for invoice generation and partly processed further. The purpose of these analyses could be a simple statistics, but could even reveal deeper connections, namely analysis of client drop-out and calculating client value.

A provider has accurate information only from the client endpoint, since the network ends there, but no information about how many people use the service at an endpoint. Despite having data about the subscriber, from this information about the number of users cannot be obtained. A simple example for this is a sublet where university students live but the services are bound to the landlord who doesn’t even use them. In my thesis I’m showing an algorithm which can estimate the number of users at a given endpoint. I’m providing the formal description and through an example I present how it works and provide results. For the purpose I use the call data of a telecom provider, and then with the given algorithm and the help of some kind of data-mining model I try to estimate that whether a given subscription is used by a single person or more.

I test the final model on an independent dataset and I evaluate it. Then finally I summarize the experiences and make a proposal for possible modifications, upgrades and usage.


