Learning based semantic annotation system for small corpuses

OData support
Supervisor:
Dr. Lengyel László
Department of Automation and Applied Informatics

These days, there are plenty of IT systems and research studies that use the techniques and approaches produced by the big data exploration.

A goal of my thesis is to investigate some machine learning algorithms -decision trees, Bayes classifiers, neural networks, support vector machines, kNN and rule based classifiers- through planning, executing and evaluating tests on small sized English corpuses. I approach the unique personalized knowledge representation field with machine learning and natural language processing, as well. It is solved with taking the user into active participation, while creating semantic labels, annotations and frames.

In relation to this, I review the interface between the system and the user, which is a web technology (PHP, jQuery, HTML5) based client application, supplied with the capability to realize semantic annotation processes. This UI provides a clear, responsive way of managing textual documents, structures and annotation related elements.

Finally, I introduce my Spring (JavaEE) based framework, which produces and communicates the results from Weka machine learning and personal knowledge representation to the client via a WebSockets based dialogue engine.

Throughout the documentation of my work, I touch upon the implementation details, emerging problems, engineering challenges as well as the component showcases.

Downloads

Please sign in to download the files of this thesis.