Machine learning from small set of data

OData support
Supervisor:
Hadházi Dániel
Department of Measurement and Information Systems

Due to recent breakthroughs, current learning systems are able to overperform human’s

capabilities in a lot of fields, such as image processing, natural language processing, and

voice processing. However, this performance requires a lot of training data and huge

computing resources. The ultimate goal of my work is to create a learning system that

can efficiently use training data. Efficiency is defined by the number of training data. The

problem can be studied from a variety of approaches, the author discusses two solutions:

the one-shot learning and the active learning direction are presented in this thesis.

In the first part of this work, I examine the possibilities of creating an architecture that uses

a very few data points and this system is able to classify with minimal accuracy decrease.

I study the one-shot learning task and optimize the learning system. In this task, the

model can use exactly one sample per class to learn the task. With this constraint, the

problem was shifted towards extracting the useful features of the data points. During this

task, these features are also learned from another dataset with similar characteristics.

The second half of this work attempts to optimize the problem of collecting data in the

way of controlling the collecting process to get more valuable data. A data point can be

considered a more valuable data if it determines the decision more than another sample.

This issue is part of the active learning topic. During the thesis, I study a situation in

which the data generator is available and the properties of the desired data point can be

adjusted. This task is illustrated by an example of the chapter in which speaker recognition

is the task and data generators are the persons to be recognized. The spoken word itself

can be controlled by determining the word itself and the speaker’s identity

Downloads

Please sign in to download the files of this thesis.