Keyword suggestion of scientific studies with machine learning

OData support
Dr. Tapolcai János
Department of Telecommunications and Media Informatics

One essential step of research is discovering studies related to the problem of the

research. In case of a popular research area, it might be difficult to find the most

appropriate ones.

Researchers assign keywords to their studies to make it easier to get introspection of

the content of the study and keywords also facilitate the searchability of the studies.

Although in some cases the keywords might not be relevant, and in other cases the

set of assigned keywords might be deficient. Deep learning has became one of the

main technologies of natural language processing, therefore it gives us promising

opportunities to automate keyword assignment.

The main topic of this thesis is keyword extraction based on abstracts of scientific

studies. First I introduce the technical and theoretical background of the problem. I

introduce the main concepts, some related techniques, definitions and metrics, and

the following three applications of natural language processing:

• Keyword extraction

• Summary generalization

• Title suggestion

I describe how machine learning — and more specifically deep learning — techniques

can handle natural language texts, and what types of deep neural networks are the

most suitable for this problem and why. Evaluation and measuring the results is

an important part of deep learning or other approaches of classification tasks. I

introduce the topic of evaluation in two main parts: first I introduce classification

evaluation measurements in general, and then I introduce the background and some

of the evaluation measurements of multi-label classification, as keyword extraction

is handled in this document as a multi-label classification task.

As the dataset that I could use was an important part of my work, I introduce it in

its own chapter.

I describe the details of the elaborated work of the previous two semesters, and what


further features can be applied on the implemented solutions, how the results can

be improved, what other approaches can be used.

Finally I sum up the topics described in this document, and I try to draw the final



Please sign in to download the files of this thesis.