[R] Do you use R for data manipulation?
viktoras at ekoinf.net
Wed May 6 09:22:15 CEST 2009
well, I am less proficient in R comparing with other tools/languages.
Therefore my biased opinion is - it is possible in R, but it may be
easier if you use other tools, especially if you have to build a
The most accessible (although limited to MS Windows only) method would
be building GUI with HTA (HTML Application)/javasript which is nearly
the same as creating web page and calling R from there when necessary.
Less limited, but steeper learning curve - Python, Perl, Tcl/Tk - all
open source tools that can communicate with R and all have decent GUI
building tools. Then proprietary Adobe Flex, Flash, Air (the later
somehow resembles HTA) or Runtime Revolution (RR) all allow to easily
build crossplatform eye-candies, but these are not free although not too
expensive either if you can allocate some resources for your project. I
usually hide all the command line utilities beyond GUIs built with RR.
All the tools listed above can easily do any kind of data manipulation
and reshaping, but each have its strong sides: Python - tidy object
oriented syntax, tons of 3rd party modules, Perl - powerful regular
expressions tons of modules, RR - database connectivity, chunk
expressions (item, char, word, line, etc...) and syntax that makes data
manipulation much much easier.
But I may be wrong, so please let me here ask another related question
(new thread?..) for the group - what do you use to build graphical user
interfaces for end-users of your tools in R?
All the best
Farrel Buchinsky wrote:
> Is R an appropriate tool for data manipulation and data reshaping and data
> organizing? I think so but someone who recently joined our group thinks not.
> The new recruit believes that python or another language is a far better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group. Her assessment is that R is useful
> only when it comes to data analysis and working with statistical models.
> So what do you think:
> 1)R is a phenomenally powerful and flexible tool and since you are going to
> do analyses in R you might as well use it to read data in and merge it and
> reshape it to whatever you need.
> 2) Are you crazy? Nobody in their right mind uses R to pipe the data around
> their lab and assemble it for analysis.
> Your insights would be appreciated.
> Details if you are interested:
> Our setup: Hundreds of patients recorded as cases with about 60 variables.
> Inputted and stored in a Sybase relational database. High throughput SNP
> genotyping platforms saved data output to csv or excel tables. Previously,
> not knowing any SQL I had used Microsoft Access to write queries to get the
> data that I needed and to merge the genotyping with the clinical database.
> It was horrible. I could not even use it on anything other than my desktop
> machine at work. When I realized that I was going to need to learn R to
> handle the genetic analyses I decided to keep Sybase as the data repository
> for the clinical information and the do all the data manipulation, merging
> and piping with R using RODBC. I was and am a very amateur coder.
> Nevertheless, many many hours later I have scripts that did what I needed
> them to do and I understand R code and can tinker with it as needed. My
> scripts work for me but they are not exactly user-friendly for others in the
> laboratory to just run. For instance, depending on what machine the script
> is being run from, one may need to change the file name or file path and
> tinker under the hood to accomplish that. My bias is to fulfill all our data
> manipulation and reshaping with R. Since I am the principal investigator it
> is me who stays constant and coders or analysts who may come and go.
> I am even more enamored with R for data manipulation since reading a book
> about it.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help