[R] any book and tutorial about how to manipulate data with R/S+

Michael Grant mwgrant2001 at yahoo.com
Sun Mar 13 03:53:02 CET 2005


Wensui,

Here is an answer from a different perspective.
Reading between the lines, you may be involved in
'remedial' data preparation at times. Depending on
exactly what kind of tasks you are talking about you
MAY be well advised to work in a database--that is why
they exist. It just depends on what you have to do. 

I work with environmental data. And I work with some
really junky data at times. I have to spend lots of
effort grooming and combining data originally
collected for reasons other than the one at hand, from
disparate idiosyncratic sources, having information in
both similar and very dissimilar formats, data of
varying completeness, etc. I have to process data
qualifiers, strip numbers out of strings, put them
in--on and on. And of course it is different from
record to record. This is just the nature of the
beast. 

Another element is doing these same tasks over. One
sometimes does not get the data in one shot. I
remediate data, construct datasets, and process it.
Then I get additional and/or corrected data and have
do it again. This kind of thing is probably easier to
track in DBs or spreadsheets. 

I would never try doing these tasks in R (or SPSS/SAS
for that matter.) EXCEL works up to a point but I also
go into MSAcess exploiting its visual query building
and VB capabilities. As much as I dislike MS(I've been
bitten too many times)I have to admit that the ability
to easily construct (visual) queries, browse the
results, etc., has been very useful. This kind
remedial preparation is sometimes easy and sometimes
brutal.

A point here is that as the complexity of your data
preparation increases it may be more efficient to do
it in applications more appropriate to the task. Where
the breakpoint is, is of a function of your own
capabilities/inclinations in R (SPSS, SAS), EXCEL,
Access or whatever. The one thing I know is that the
problems of data prep., in my world al least, has
always been there and will likely remain. I accept it
and move on.

The approach(es) you develop should be influenced by
the frequency of such efforts and the size of the
datasets typically involved. BTW, one truism is that
project managers do not seem capable of understanding
that just because something is in a computer does not
mean it is  ready to go to give them what they want
:O(. Gee this stuff takes work...as you seem well
aware. 

I steel myself for the task by reminding myself that
writing and running the R programming is an enjoyable
reward for my toil. R is fun. SPSS never was. I have
not worked much with SAS because--and this a
consideration--I can't afford a seat at home. 

BTW if some DB appears appropriate, then learn some
SQL --even the if you use Access. There is always
RODBC out there and it may be useful down the road.

If you don't want to do all this then get an intern,
graduate student, postdoc, or new career ;O).

Best regards,
Michael Grant
Graduate School of Applied Brute Force in the Sciences




More information about the R-help mailing list