Large-scale graph based visualization of biomedical data

OData support
Dr. Hullám Gábor István
Department of Measurement and Information Systems

The subject of my thesis work is a real-life project whose objective is to find out how large-scale graph based visualization tools can be applied to visualize biomedical data. For this purpose two tools’ API-s were investigated during the project. One of these tools is called Collaboration Spotting (CS), while the other framework is an individually built, yet similar front-end application.

CS is a framework developed by the European Organization for Nuclear Research, known as CERN, headquartered in Geneva, Switzerland. It was utilized for visualization within the joint research cooperation between Budapest University of Technology and Economics and CERN. The other front-end application is an individual solution, whose core is the Sigma.js library, serving as the rendering engine to generate the graphical output.

In the early phases of the work, I put emphasis on literature research on bioinformatics, which was followed by some additional research on graphs in general, as well as on possible graph based applications and software solutions that already exist. In the meantime, I examined public biomedical knowledge bases and determined relevant entities with their identifiers and additional attributes. This helped to generally conceive the relationships and the hierarchy in our input data. All this eventually enabled the construction of a complex graph based data model.

After familiarizing myself with the various data sources, I created a simplified model as well, based on which data for seven biological entities have been extracted. Then I made the necessary steps to transform the input data into the desired form. Regarding the data attributes, the specific tools required some further ETL steps to be executed.

During the second stage of the project, focus was kept on CS. In the beginning, I examined the data structure of possible inputs. Following that, I went through a deeper introduction to CS and started studying the API of the software based on which the next step was to implement the application specific schema for CS. This involved creating the graph element descriptors during which I took the high number of possible entities into account. CS as a framework has been investigated, however the application of the software itself did not happen due to external causes.

Later, the main task of the third stage was to build my own application to see how the data can be visualized using an alternative, but still similarly working solution. I decided to select a solution based on the careful consideration of advantages and disadvantages of various solutions. Finally Sigma.js proved to be the best one, and I performed a thorough investigation of it. Then I set up the environment for building an easy-to-publish application. The way towards a working application started with primarily managing to build a certain kind of visualization of at least one record at first. Later I was working on the filtering options, then on implementing auto-layout, and finally on enhancing the platform from the user’s perspective as well.

Finally, the Sigma.js based application was the one with which I performed the visualizations. The tool enables the user to visualize ontological data and to see how certain entities relate to each other. Adapting to the size of data that can be available at the same time, the tool is designed to encourage the users to perform filtered selections. Furthermore, the tool is capable of taking the measure of association into account as well.

Ultimately it is among my goals to enable users to gain insights and create value on biomedical fields with the help of my framework.


Please sign in to download the files of this thesis.