Text annotation using controlled natural languages

OData support
Dr. Mészáros Tamás Csaba
Department of Measurement and Information Systems

The subject of my bachelor's thesis was the design and implementation of a text annotation system. The system is capable of handling natural language notes pertaining to literary works, extracting knowledge from them and building a knowledge base of these works.

The fruit of my work is an application, whith the help of which users can provide the system with texts, then annotate these texts with critical notes. These notes are written in a natural language suited for human interaction. Subsequently, the system extracts machine-readable knowledge from these annotations, which it then stores in a knowledge base. In addition, this knowledge base has a reasoning process. This way statements logically deducible from the input statements are also part of the available knowledge. The user is also able to pose queries to the system and to write them in a natural language as well. This way, users can access the knowledge regarding texts of interest, which is the most striking benefit and novelty of the system.

To implement this, several tasks had to be solved. Firstly the system's architecture was designed. As a result the system's components gained their respective places along with the depencencies between them. Secondly the process of grammatical analysis was designed, as well as the required prerequisites: the grammatical correctness of the sentences had been regulated by the introduction of grammatical rulesets. These sets also had to conform to a parent ruleset, in order to be uniformly manageable by the parser. Subsequently, the grammatical parser was implemented based on a well-known algorithm. This was then supplemented with custom modifications in order to be capable of knowledge extraction. The knowledge was stored separated in two parts: the entities -- texts, notes, rules -- in a database, while the statements pertaining to these entities were stored in the knowledge base. This functionality has been embedded in a server application, which makes it available in a popular and common fashion.

In connection to this, the user interface took the form of a web application. This interface enabled the user the handling of the entities, which meant the creation, viewing, modification and deletion thereof. It also enabled the querying of the knowledge extracted from the texts and their notes.

The functioning of the implemented tasks is illustrated at the relevant points in the document. In order to do this, a real life example has been chosen, namely the Letters from Turkey by Hungarian essayist and writer Kelemen Mikes with the critical annotations of Hungarian literary historian Lajos Hopp, in support of which several grammatical rulesets were designed. Through them, the translation of a text's information content into a knowledge representation can be adequately demonstrated. Furthermore, the document touches upon the possibilities of upgrading the system.


Please sign in to download the files of this thesis.