Nowadays the BitTorrent file sharing applications have a great influence. Millions of people use them everyday and BitTorrent is responsible for 35% of all Internet traffic. My goal is to design a new system, which collects data from BitTorrent applications and stores them in a structure to use them efficiently in the future. I have a focus on the life-cycle of the torrents and the behaviour of the users, that's why the number of seeders and leechers are relevant to me. The amount of data expands quickly (Big Data) while collecting data from the internet, that's why I use a technology designed for this purpose, the Hadoop framework, which is an open-source software framework for storage and large scale processing of data-sets and has a lat of satisfied users, like Facebook, Yahoo!, Twitter, Microsoft, Apple, eBay.
My thesis has two main parts. In the first part I introduce the technologies, focusing on the parts I use. In the second part I introduce my self-made application point for point: build, function, how it uses the technologies introduced in the first part. The two main functions of my system are the data collection and data storage. They work aligned together, the application doesn't need any human interaction while running. The application in my thesis runs on a host machine, which collects data and there's a guest virtual machine running on the same computer storing the datas. My data source is one of the biggest torrent website of Hungary. My application is designed to process this website.