Vizuális feltáró adatanalízis webböngésző alapú támogatása

OData támogatás
Kocsis Imre
Méréstechnika és Információs Rendszerek Tanszék

Exploratory Data Analysis (EDA) is an approach to data analysis where the data analyst explores the data to discover its underlying characteristics by identifing trends and outlying data points.

Rather than trying to apply traditional statistical models from the beginning, analysts using the EDA approach prefer to withhold assumptions about the underlying statistical model until after some exploration of the data. By not making many initial assumptions about the structure of the data, this mostly visual approach can often yield a better understanding and quicker discovery of interesting phenomena. This, by its nature ad-hoc ‘visual discovery’ of data using predominantly plots of the data and interactions with (and among) the plots is in stark contrast to ‘Confirmatory Data Analysis’ (CDA). CDA mainly deals with evaluating statistical measures of and building statistical models on data. In the late 70’s, led by the work of John W. Tukey, EDA and CDA became distinct (although necessarily complementary) approaches.

While there are notable and widely cited historical examples , practical and agile EDA really became feasible with the proliferation of (micro)computers. As such, the main tools of EDA are computer programs that allow analysts to perform visual analysis on data. (The most widely used of these applications will be described and compared in Chapter 1)

There is, however, no purely web based tool that is focusing on supporting EDA. As of the writing of this thesis there are some noteworthy JavaScript libraries that support different plot types; however, none of these covers all plot types that are customarily used in EDA. Also, modern (desktop) EDA tools enable interactions with the plots and link them together – so giving rise e.g. to the ability to see the projection of a ‘selection’ on all other plots, too. These are features that are not consistently covered by existing browser-based solutions.

Thus, the goal of this thesis is to evaluate whether modern browsers are an appropriate platform for interactive, visual EDA and provide a proof of concept implementation of the ‘minimally necessary’ EDA capabilities in the form of a mini-framework.

The thesis is structured as follows. Chapter 1 describes and compares the most widely used visual, interactive EDA tools. The treatment is actually confined to the set of tools providing interactive capabilities (and that are this way dedicated EDA tools) – simple ‘plotting’ solutions (that would be too numerous even to list) are not taken into account. An essential subset of interactive, visual EDA features is defined in Chapter 3.

Chapter 4 evaluates the potential technologies to implement these capabilities in a browser. Chapter 6 describes the implementation of the Proof of Concept browser based, interactive, VISual exploratory data ANalysis (VISAN) solution.

Any EDA tool supports an analysis workflow (although in most cases, it becomes apparent only post hoc that the activities performed can be seen as some workflow). As such, it is necessary to be able to save and load the state of a discovery process to be able to suspend and resume analysis, share analysis state for collaboration and make the analysis process repeatable, or at least replayable. As a matter of fact, even best of breed desktop EDA tools support these goals poorly or not at all; certainly not through open standards. As such, the initial attempt at a portable markup language for EDA described in Chapter 4 is hoped to further the state of the art in the general context of EDA tooling.

Letölthető fájlok

A témához tartozó fájlokat csak bejelentkezett felhasználók tölthetik le.