[R] When is *interactive* data visualization useful to use?

Rainer M Krug r.m.krug at gmail.com
Mon Feb 14 11:24:35 CET 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/14/2011 11:21 AM, Gesmann, Markus wrote:
> Hi Rainer,
> 
> You may want to look into the package googleVis, which provides an
> interface between the Google Visualisation API and R, see
> http://code.google.com/p/google-motion-charts-with-r/

True - forgotten about that one. It looks actually nice fore especially
time series.

Rainer

> 
> Regards,
> 
> Markus 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Rainer M Krug
> Sent: 14 February 2011 09:43
> To: Claudia Beleites
> Cc: R Help
> Subject: Re: [R] When is *interactive* data visualization useful to use?
> 
> On 02/11/2011 08:21 PM, Claudia Beleites wrote:
>> Dear Tal, dear list,
> 
>> I think the importance of interactive graphics has a lot do with how 
>> visual your scientific discipline works. I'm spectroscopist, and I 
>> think we are very visually oriented: if I think of a spectrum I 
>> mentally see a graph.
> 
>> So for that kind of work, I need a lot of interaction (type: plot, 
>> change a bit, plot again), e.g.
>> One example is the removal of spikes from Raman spectra (caused e.g. 
>> by cosmic rays hitting the detector). It is fairly easy to compute a 
>> list of suspicious signals. It is already much more complicated to 
>> find the actual beginning and end of the spike. And it is really 
>> difficult not to have false positives by some automatic procedure, 
>> because the spectra can look very different for different samples. It 
>> would just take me far longer to find a computational description of 
>> what is a spike than interactively accepting/rejecting the
> automatically marked suspicions.
>> Even though it feels like slave work ;-)
> 
>> Roughly the same applies for the choice of pre-processing like 
>> baseline correction. A number of different physical causes can produce
> 
>> different kinds of baselines, and usually you don't know which process
> 
>> contributes to what extent. In practice, experience suggests a method,
> 
>> I apply it and look whether the result looks as expected. I'm not 
>> aware of any performance measure that would indicate success here.
> 
>> The next point where interaction is needed pops up as my data has e.g.
>> spatial and spectral dimensions. So do the models usually: e.g. in a 
>> PCA, the loadings would usually capture the spectroscopic direction, 
>> whereas the scores belong to the spatial domain. So I have "connected"
>> graphs: the spatial distribution (intensity map, score map, etc.), and
> 
>> the spectra (or loadings).
>> As soon as I have such connections I wish for interactive
> visualization:
>> I go back and forth between the plots: what is the spectrum that 
>> belongs to this region of the map? Where on the sample are high 
>> intensities of this band? What is the substance behind that: if it is 
>> x, the intensities at that other spectral band should correlate. And 
>> then I want to compare this to the scatterplot (pairs plot of the PCA 
>> score) or to a dendrogram of HCA...
> 
>> Also, exploration is not just prerequisite for models, but it 
>> frequently is already the very proper scientific work (particularly in
> 
>> basic science). The more so, if you include exploring the models: Now,
> 
>> which of the bands are actually used by my predictive models? Which 
>> samples do get their predictions because of which spectral feature?
>> And, the "statistical outliers" may very well be just the interesting 
>> part of the sample. And the outlier statistics cannot interprete the 
>> data in terms of interesting ./. crap.
> 
>> For presentation* of results, I personally think that most of the time
> 
>> a careful selection of static graphs is much better than live
> interaction.
>> *The thing where you talk to an audience far awayf from your work 
>> computer. As opposed to sitting down with your client/colleague and 
>> analysing the data together.
> 
>>> It could be argued that the interactive part is good for exploring 
>>> (For
>>> example) a different behavior of different groups/clusters in the 
>>> data. But when (in practice) I approached such situation, what I 
>>> tended to do was to run the relevant statistical procedures (and 
>>> post-hoc tests)
>> As long as the relevant measure exists, sure.
>> Yet as a non-statistician, my work is focused on the physical/chemical
> 
>> interpretation. Summary statistics are one set of tools for me, and 
>> interactive visualisation is another set of tools (overlapping
> though).
> 
>> I may want to subtract the influence of the overall unchanging sample 
>> matrix (that would be the minimal intensity for each wavelength). But 
>> the minimum spectrum is too noisy. So I use a quantile. Which one?
>> Depends on the data. I'll have a look at a series (say, the 2nd to 
>> 10th
>> percentile) and decide trading off noise and whether any new signals 
>> appear. I honestly think there's nothing gained if I sit down and try 
>> to write a function scoring the similarity to the minimum spectrum and
> 
>> the noise level: the more so as it just shifts the need for a decision
> 
>> (How much noise outweighs what intensity of real signal being
> subtracted?).
>> It is a decision I need to take. With number or with eye. And after 
>> all, my professional training was thought to enable me taking this 
>> decision, and I'm paid (also) for being able to take this decision 
>> efficiently (i.e. making a reasonably good choice within not too long
> time).
> 
>> After all, it may also have to do with a complaint a colleague from a 
>> computational data analysis group once had. He said the bad thing with
> 
>> us spectroscopists is that our problems are either so easy that 
>> there's no fun in solving them, or they are too hard to solve.
> 
>>> - and what I
>>> found to be significant I would then plot with colors clearly 
>>> dividing the data to the relevant groups. From what I've seen, this 
>>> is a safer approach then "wondering around" the data (which could 
>>> easily lead to data dredging (were the scope of the multiple 
>>> comparison needed for correction is not even clear).
>> Sure, yet:
>> - Isn't that what validation was invented for (I mean with a proper, 
>> new, [double] blind test set after you decided your parameters)?
>> - Summarizing a whole data set into a few numbers, without having 
>> looked at the data itself may not be safe, either:
>> - The few comparisons shouldn't come at the cost of risking a bad 
>> modeling modelling strategy and fitting parameters because the data 
>> was not properly examined.
> 
>> My 2 ct,
> 
>> Claudia (who in practice warns far more frequently of multiple 
>> comparisons and validation sets being compromised (not independent) 
>> than of too few data exploration ;-) )
> 
> These are very interesting and valid points. But which tools are
> recommended / usefull for interactive graphs for data evaluation? I
> somehow have difficulties getting my head around ggobi, and haven't yet
> tried out mondian (but I will). Are there any other ones (as we are ion
> the R list - which integrate with R) which can be recommended?
> 
> Rainer
> 
> 
> 
> 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
**********************************************************************
The information in this E-Mail and in any attachments is CONFIDENTIAL
and may be privileged.
If you are NOT the intended recipient, please destroy this message and
notify the sender immediately.
You should NOT retain, copy or use this E-mail for any purpose, nor
disclose all or any part of its
contents to any other person or persons.

Any views expressed in this message are those of the individual sender,
EXCEPT where the sender
specifically states them to be the views of Lloyd's.

Lloyd's may monitor the content of E-mails sent and received via its
network for viruses or
unauthorised use and for other lawful business purposes.

Lloyd's is authorised under the Financial Services and Markets Act 2000
**********************************************************************



- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1ZAuMACgkQoYgNqgF2egrKCwCfY7kKZ9KcyJn5POn1K09HNkQ8
i0wAn1v/0709FspoF8HmUjWLJv9pdMrm
=dtJD
-----END PGP SIGNATURE-----



More information about the R-help mailing list