[R] Tools For Preparing Data For Analysis

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Sun Jun 10 10:28:30 CEST 2007


On 10-Jun-07 02:16:46, Gabor Grothendieck wrote:
> That can be elegantly handled in R through R's object
> oriented programming by defining a class for the fancy input.
> See this post:
>   https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html
> for a simple example of that style.
> 
> On 6/9/07, Robert Wilkins <irishhacker at gmail.com> wrote:
>> Here are some examples of the type of data crunching you might
>> have to do.
>>
>> In response to the requests by Christophe Pallier and Martin Stevens.
>>
>> Before I started developing Vilno, some six years ago, I had
>> been working in the pharmaceuticals for eight years ( it's not
>> easy to show you actual data though, because it's all confidential
>> of course).

I hadn't heard of Vilno before (except as a variant of "Vilnius").
And it seems remarkably hard to find info about it from a Google
search. The best I've come up with, searching on

  vilno  data

is at
  http://www.xanga.com/datahelper

This is a blog site, apparently with postings by Robert Wilkins.

At the end of the Sunday, September 17, 2006 posting "Tedious
coding at the Pharmas" is a link:

  "I have created a new data crunching programming language."
   http://www.my.opera.com/datahelper

which appears to be totally empty. In another blog article:

  "go to the www.my.opera.com/datahelper site, go to the August 31
   blog article, and there you will find a tarball-file to download,
   called vilnoAUG2006package.tgz"

so again inaccessible; and a google on "vilnoAUG2006package.tgz"
gives a single hit which is simply the same aricle.

In the Xanga blog there are a few examples of tasks which are
no big deal in any programming language (and, relative to their
simplicity, appear a bit cumbersome in "Vilno"). 

I've not seen in the blog any instance of data transformation
which could not be quite easily done in any straigthforward
language (even awk).

>> Lab data can be especially messy, especially if one clinical
>> trial allows the physicians to use different labs. So let's
>> consider lab data.
>> [...]

That's a fairly daunting description, though indeed not at all
extreme for the sort of data that can arise in practice (and
not just in pharmaceutical investigations). But the complexity
is in the situation, and, whatever language you use, the writing
of the program will involve the writer getting to grips with
the complexity, and the complexity will be present in the code
simply because of the need to accomodate all the special cases,
exceptions and faults that have to be anticipated in "feral" data.

Once these have been anticipated and incorporated in the code,
the actual transformations are again no big deal.

Frankly, I haven't yet seen anything "Vilno" that couldn't be
accomodated in an 'awk' program. Not that I'm advocating awk for
universal use (I'm not that monolithic about it). But I'm using
it as my favourite example of a flexible, capable, transparent
and efficient data filtering language, as far as it goes.


SO: where can one find out more about Vilno, to see what it may
really be capable of that can not be done so easily in other ways?


(As is implicit in many comments in Robert's blog, and indeed also
from many postings to this list over time and undoubtedly well
known to many of us in practice, a lot of the problems with data
files arise at the data gathering and entry stages, where people
can behave as if stuffing unpaired socks and unattributed underwear
randomly into a drawer, and then banging it shut).

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jun-07                                       Time: 09:28:10
------------------------------ XFMail ------------------------------



More information about the R-help mailing list