Domain names are the foundation of the internet infrastructure. Although network equipment can be reached by their Internet Protocol (IP) addresses, these addresses provide almost no information to the human mind and they are even hard to remember. Domain names translate these IP addresses to a form which is easy to understand and remember.
When the Domain Name System (DNS) was introduced, nobody thought that the web will become the main form of internet usage, so these early systems had almost no defense mechanisms against harmful business practices.
This permitted the rise of a domain speculation army, and in the lack of legal regulation domain registration was anyone’s prey. In 1999 the authorities introduced some legal countermeasures, but recent works in the field suggest that the countermeasures had no significant effect on the measure of different abuses.
In this work I concentrate on one such abuse, typosquatting. It is the registration of domain names with typos. This type of abuse tries to get some traffic from popular sites. Since the cost of domain registration is relatively small, the profit margin is high and the risk of getting caught is low, typosquatting became popular recently.
My work will present a system capable of identifying typosquatting domains among the newly registered ones. To accomplish this task I will have to choose a lexical algorithm and after that I will explore the ratio of typosquatting domains among the newly registered domains.