It is wrong to copy texts without reference to the source. Although it is quite complicated and resource intensive to verify copying. Referenced bibliography has to be explored, and references have to be checked. Furthermore, unreferenced bibliography also has to be explored. This task is rather difficult, as there is a fair amount of written work available. Use of an automated solution, which can determine the similarity of documents, makes the verification of documents significantly easier.
In this document I am going to present an application, that is able to compare large amount of documents to each other and to external sources, and determine the similarity of documents. My examined dataset consists of the available thesis works from the Faculty of Electrical Engineering and Informatics at the Budapest University of Technology and Economics, and the articles of the Hungarian Wikipedia. This application can identify parts of two documents that are identical, making further examinations possible.
The application forms a framework, which gives us the opportunity to extend the repertoire of existing examinations with new ones.