Web scraping in powershell

OData support
Dr. Goldschmidt Balázs
Department of Control Engineering and Information Technology

Given one of the most powerful system administration tools available, the PowerShell, capable of solving hundreds of problems with ease. Not only it can automate the everyday processes, but also, it can automate gathering information from the World Wide Web. In this work I made a proof of concept to prove that PowerShell can be used efficiently for tasks very different from its original purpose, thanks to the variety of features it has.

The proof of concept is about processing data of Használtautó.hu’s car pages (which is called web scraping in general). This processing is focused on the car comparison (ranking) functionality that is not present on the site. The idea is coming from árukereső.hu’s similar functionality, a table based, side by side comparator of products’ details. A very similar tool is already available at the target page. I improved this idea by ranking the cars based on their main parameters. Due to the ineffectiveness of linear methodologies in comparing cars of varied ages and conditions by a handful of features, I needed to develop my own simple (and deterministic) algorithm. This gives the basis of the car ranking that can produce valuable information about cars of similar ages and prices. For the sake of completeness, I created a webpage for the service to be available online. This is a user friendly abstraction written in PHP and JavaScript in place of the PowerShell command line interface.


Please sign in to download the files of this thesis.