[R] Do you use R for data manipulation?

Greg Snow Greg.Snow at imail.org
Wed May 6 18:11:36 CEST 2009

In my opinion, no statisticians toolbox should contain only 1 tool (even if it is as amazing a tool as R).  Learning the different tools helps you appreciate when each are the most appropriate to use and learn different ways of looking at problems.   There are some tasks that I (it could easily differ for others) find quickest to do some data extraction using Perl, then load the results into R.

Having said the above, I do admit that the percentage of time that I spend using tools other than R for working with data has gone down quite a bit with time.  3 possible reasons:

1. my clients are getting better at giving me the data in appropriate forms
2. my proficiency with R continues to grow and I can better see how to do something using R
3. R continues to grow with more and more tools to help manage data.

And a possible 4th: 4. I am getting to lazy in my old age to switch to other programs.

While I like to think that I am having success at educating my clients, number 1 only contributes very little to the overall, 3 is definitely a big contributor and hopefully 2 is part of the reason as well.


Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Farrel Buchinsky
> Sent: Tuesday, May 05, 2009 10:23 PM
> To: R
> Cc: Ross; gregory_warnes at urmc.rochester.edu; greg at warnes.net
> Subject: [R] Do you use R for data manipulation?
> Is R an appropriate tool for data manipulation and data reshaping and
> data
> organizing? I think so but someone who recently joined our group thinks
> not.
> The new recruit believes that python or another language is a far
> better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group. Her assessment is that R is
> useful
> only when it comes to data analysis and working with statistical
> models.
> So what do you think:
> 1)R is a phenomenally powerful and flexible tool and since you are
> going to
> do analyses in R you might as well use it to read data in and merge it
> and
> reshape it to whatever you need.
> OR
> 2) Are you crazy? Nobody in their right mind uses R to pipe the data
> around
> their lab and assemble it for analysis.
> Your insights would be appreciated.
> Details if you are interested:
> Our setup: Hundreds of patients recorded as cases with about 60
> variables.
> Inputted and stored in a Sybase relational database. High throughput
> genotyping platforms saved data output to csv or excel tables.
> Previously,
> not knowing any SQL I had used Microsoft Access to write queries to get
> the
> data that I needed and to merge the genotyping with the clinical
> database.
> It was horrible. I could not even use it on anything other than my
> desktop
> machine at work. When I realized that I was going to need to learn R to
> handle the genetic analyses I decided to keep Sybase as the data
> repository
> for the clinical information and the do all the data manipulation,
> merging
> and piping with R using RODBC. I was and am a very amateur coder.
> Nevertheless, many many hours later I have scripts that did what I
> needed
> them to do and I understand R code and can tinker with it as needed. My
> scripts work for me but they are not exactly user-friendly for others
> in the
> laboratory to just run. For instance, depending on what machine the
> script
> is being run from, one may need to change the file name or file path
> and
> tinker under the hood to accomplish that. My bias is to fulfill all our
> data
> manipulation and reshaping with R. Since I am the principal
> investigator it
> is me who stays constant and coders or analysts who may come and go.
> I am even more enamored with R for data manipulation since reading a
> book
> about it.
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list