[R] Do you use R for data manipulation?
liuwensui at gmail.com
Wed May 6 06:30:12 CEST 2009
take a look at sqldf package(http://code.google.com/p/sqldf/), you
will be amazed.
On Wed, May 6, 2009 at 12:22 AM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
> Is R an appropriate tool for data manipulation and data reshaping and data
> organizing? I think so but someone who recently joined our group thinks not.
> The new recruit believes that python or another language is a far better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group. Her assessment is that R is useful
> only when it comes to data analysis and working with statistical models.
> So what do you think:
> 1)R is a phenomenally powerful and flexible tool and since you are going to
> do analyses in R you might as well use it to read data in and merge it and
> reshape it to whatever you need.
> 2) Are you crazy? Nobody in their right mind uses R to pipe the data around
> their lab and assemble it for analysis.
> Your insights would be appreciated.
> Details if you are interested:
> Our setup: Hundreds of patients recorded as cases with about 60 variables.
> Inputted and stored in a Sybase relational database. High throughput SNP
> genotyping platforms saved data output to csv or excel tables. Previously,
> not knowing any SQL I had used Microsoft Access to write queries to get the
> data that I needed and to merge the genotyping with the clinical database.
> It was horrible. I could not even use it on anything other than my desktop
> machine at work. When I realized that I was going to need to learn R to
> handle the genetic analyses I decided to keep Sybase as the data repository
> for the clinical information and the do all the data manipulation, merging
> and piping with R using RODBC. I was and am a very amateur coder.
> Nevertheless, many many hours later I have scripts that did what I needed
> them to do and I understand R code and can tinker with it as needed. My
> scripts work for me but they are not exactly user-friendly for others in the
> laboratory to just run. For instance, depending on what machine the script
> is being run from, one may need to change the file name or file path and
> tinker under the hood to accomplish that. My bias is to fulfill all our data
> manipulation and reshaping with R. Since I am the principal investigator it
> is me who stays constant and coders or analysts who may come and go.
> I am even more enamored with R for data manipulation since reading a book
> about it.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Acquisition Risk, Chase
Blog : statcompute.spaces.live.com
Tough Times Never Last. But Tough People Do. - Robert Schuller
More information about the R-help