[R] ref card for data manipulation?

hadley wickham h.wickham at gmail.com
Thu Dec 11 15:19:03 CET 2008


>> You (as many before you) have overlooked the ave() function, which can
>> replace the ordering as well the do.call(c,tapply(....))
>>
>
> Majority of questions on this list concern data manipulation. Many are
> repetitive. "Overlooking" like that will always happen unless some
> comprehensive data manipulation documentation is made.
> I think many people would benefit if  a specialized data.manip ref.card were
> conceived.

I like the idea, but is a reference card really enough?  To me, what
most people need to tackle data manipulation problems is a broad
strategy, not a list of useful functions.  plyr is a codification of
my most recent ideas on one such strategy: splitting a big data
structure into smaller pieces, applying a function to each piece and
then joining them back together.  Just recognising your problem can be
solved with this strategy is a big step forward, the functions in plyr
just save you some typing and a bit of thought compared to doing it in
base R.

Recognising this strategy has helped me in my own data manipulation
problems - many tasks with which I used to struggle are now easy to
solve, not just because of plyr, but because I have a framework in
which to think about the problem.  But this is just one strategy and
there must be many more common strategies waiting to be identified.  I
think working on this would be time better spent - describing a
strategy gives people the tools to help themselves.  (Of course this
doesn't help the people who just want canned answers, but I'm less
interested in helping them)

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list