Analysis of non-structured data

OData support
Dr. Gajdos Sándor
Department of Telecommunications and Media Informatics

The majority of the information stored in today’s information systems is present as text, image, audio and video files. The efficient machine-based processing of these sources was unimaginable for a long time, so serious amounts of human resources had to be taken in – this is what Hewlett-Packard’s Autonomy product line tries to break up with.

My thesis can be divided into three main parts. In the first part my purpose is to introduce the reader to the field of unstructured data processing. In the second, I’m planning and implementing a solution to analyse trends in mobile telecommunication with the help of Autonomy’s tools. In the third part, I’m planning and implementing a user interface, which has to be suited to present the results generated by the trend analysis.

Both development tasks are guided by the waterfall lifecycle model, since I found that this is the most appropriate model to support my work. In accordance with the waterfall model, the first thing I do is defining the specification and requirements, followed by the exposition of the overall architecture and user interactions. The forthcoming detailed planning may be divided into three major parts. First I define the static structure of the system, which includes the components used to implement the overall architecture, the classes which the components consist of and the associations between them. Next I declare the data structures (both the ones provided by Autonomy and the ones I had to come up with) which make the input and output of the methods of the previously defined classes. Lastly I present the behavioural model, which consists of the states of the system and the sequences (supplemented by the required timings in the case of the trend analysis solution). After the planning part the validation of the implemented solution is presented.


Please sign in to download the files of this thesis.