Exploratory Data Analysis (EDA) is an approach to data analysis where the data analyst explores the data to discover its underlying characteristics by identifing trends and outlying data points.
Rather than trying to apply traditional statistical models from the beginning, analysts using the EDA approach prefer to withhold assumptions about the underlying statistical model until after some exploration of the data. By not making many initial assumptions about the structure of the data, this mostly visual approach can often yield a better understanding and quicker discovery of interesting phenomena. This, by its nature ad-hoc ‘visual discovery’ of data using predominantly plots of the data and interactions with (and among) the plots is in stark contrast to ‘Confirmatory Data Analysis’ (CDA). CDA mainly deals with evaluating statistical measures of and building statistical models on data. In the late 70’s, led by the work of John W. Tukey, EDA and CDA became distinct (although necessarily complementary) approaches.
While there are notable and widely cited historical examples , practical and agile EDA really became feasible with the proliferation of (micro)computers. As such, the main tools of EDA are computer programs that allow analysts to perform visual analysis on data. (The most widely used of these applications will be described and compared in Chapter 1)
Thus, the goal of this thesis is to evaluate whether modern browsers are an appropriate platform for interactive, visual EDA and provide a proof of concept implementation of the ‘minimally necessary’ EDA capabilities in the form of a mini-framework.
The thesis is structured as follows. Chapter 1 describes and compares the most widely used visual, interactive EDA tools. The treatment is actually confined to the set of tools providing interactive capabilities (and that are this way dedicated EDA tools) – simple ‘plotting’ solutions (that would be too numerous even to list) are not taken into account. An essential subset of interactive, visual EDA features is defined in Chapter 3.
Chapter 4 evaluates the potential technologies to implement these capabilities in a browser. Chapter 6 describes the implementation of the Proof of Concept browser based, interactive, VISual exploratory data ANalysis (VISAN) solution.
Any EDA tool supports an analysis workflow (although in most cases, it becomes apparent only post hoc that the activities performed can be seen as some workflow). As such, it is necessary to be able to save and load the state of a discovery process to be able to suspend and resume analysis, share analysis state for collaboration and make the analysis process repeatable, or at least replayable. As a matter of fact, even best of breed desktop EDA tools support these goals poorly or not at all; certainly not through open standards. As such, the initial attempt at a portable markup language for EDA described in Chapter 4 is hoped to further the state of the art in the general context of EDA tooling.