Automated collection of scientometric data

OData support
Virosztek Tamás
Department of Measurement and Information Systems

Scientific performance is hardly quantifiable, however, using some kind of metric is necessary for assessing tenders, granting academic degrees and judging applicants for research positions. The most commonly used methods for quantifying the productivity and impact of the work of a researcher are based on the number and other derived metrics of their publications. Several large-scale databases exist to keep record of these metrics, the most significant ones are the Web of Science, Scopus, ResearchGate, Google Scholar, IEEE Xplore and in Hungary the Hungarian Repository of Scientific Works (Magyar Tudományos Művek Tára, MTMT).

The problem is that the information provided by these databases is often largely noisy, depending on the efforts of the researcher to keep everything up-to-date and the ability of the employees and algorithms of the scientometric databases to provide accurate data. Because of this, metrics has to be collected from every database in order to get a more accurate picture of the researcher’s work. This process can be slow and tedious.

The solution can be a metadatabase, which collects and stores all necessary data from the most significant publication databases and provides a clean, transparent overview of each researcher’s scientific metrics.

For this purpose, I developed a web application, which is able to collect scientometric data from multiple publication databases about researchers affiliated with two departments of the university. Each researcher’s scientometric identifiers are extracted from their department’s website, then used to retrieve their metrics from the publication databases. The extraction of the necessary data is done by parsing the web pages or using an API wherever it’s available. The application is designed to be scalable, thus support for additional affiliations and scientometric databases can be implemented easily. For efficient access, the data is stored in a local database and can be refreshed automatically to keep everything up-to-date. The metrics of a researcher can be refreshed manually at any time in case the most recent data is needed. The application provides a web based user interface with multiple views and search functionalities.


Please sign in to download the files of this thesis.