Extending Digital Libraries with Controlled Language Semantic Abstracts

OData support
Dr. Mészáros Tamás Csaba
Department of Measurement and Information Systems

Information available on the internet has been growing rapidly, this includes the large volume of research data being published in various formats. For example, the MEDLINE database which collects medical publications had over 2000 new articles uploaded every day last year, and the various biology oriented databases have millions of entries. Despite the exponential growth of research accessible on the web, scientific knowledge is not spreading at the same rate, because scientists lack the means to efficiently find all the relevant information for their field.

To make more intelligent and useful ways of information access available, we need to improve the current information storage methods. We can describe the content of documents in a formal, machine understandable way with the help of various semantic web technologies, but their usage requires both field specific and semantic technology knowledge. To encourage the creation of semantic representation for more documents, we need to implement solutions that can also be used by those unfamiliar with semantic technologies. Using controlled natural language solutions, we can bridge the gap between the formal logic representations and the every day, natural language summary of knowledge.

In my thesis I present a methodology which provides the means for digital libraries to build the semantic representation of the publications in a manner that's accessible for those without experience with semantic technologies. The basis of the solution is the idea of controlled language abstracts (CLA), which enable users to summarize the content of articles in machine readable natural language, not unlike creating regular abstracts. The CLA sentences have to comply by a set of controlled grammar rules, but the editing can be can be done easily with the help of intelligent text editors that give feedback to the users about what they can enter. Based on the controlled grammar, the CLAs can be transformed into formal representations, which can be used to build a complex knowledgebase.

In my paper I detail the steps required to adapt the methodology to a digital library. The library needs to be improved to support semantic technologies and the creation of CLAs. Based on the knowledgebase created by CLAs, many new intelligent features can be introduced, like content-specific semantic search. I demonstrate the new features based on a prototype implementation, but the ideas are introduced in a way that makes it easy to adapt them to other digital libraries as well.


Please sign in to download the files of this thesis.