[R] Do you use R for data manipulation?
lists at calidasoft.co.uk
Wed May 6 12:44:29 CEST 2009
I also use the approach Philipp describes below. I use Python and shell
scripts for processing thousands of input files and getting all the data
into one tidy csv table. From that point onwards it's R all the way
(often with the reshape package).
Philipp Pagel wrote:
> On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote:
>> Is R an appropriate tool for data manipulation and data reshaping and data
>> organizing? I think so but someone who recently joined our group thinks not.
>> The new recruit believes that python or another language is a far better
>> tool for developing data manipulation scripts that can be then used by
>> several members of our research group.
> I happily use both approaches depending on the original format the
> data come in:
> For data that are not in a "well behaved" format and require actual
> parsing, I tend to use Python scripts for transmogrifying the data
> into nice and tidy tables (and maybe some very basic filtering). For
> everything after that I prefer R. I also use Python if the relevant
> data needs to be harvested and assembled from many differnt sources
> (e.g. data files + web + databases).
> Once the data files are easy to read (csv, tab separated, database,
> ...) and the task is to reshape, filter and clean the data, I usually
> do it in R. R has true advantages here:
> - After reading a table into a data frame I can immediatly tell, if all
> measurements are what they are supposed to be (integer, numeric,
> factor, boolean) and functions like read.table even do quite some
> error checking for me (equal number of columns etc.)
> - Finding out if factors have the right (or plausible) number of levels is easy
> - Filtering by logical indexing
> - Powerful and reliable reshaping (reshape package)
> - Very conveniant diagnostics: str(), dim(), table(), summary(),
> plotting the data in various ways, ...
More information about the R-help