[R] Do you use R for data manipulation?

Philipp Pagel p.pagel at wzw.tum.de
Wed May 6 11:39:27 CEST 2009

On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote:
> Is R an appropriate tool for data manipulation and data reshaping and data
> organizing? I think so but someone who recently joined our group thinks not.
> The new recruit believes that python or another language is a far better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group.

I happily use both approaches depending on the original format the
data come in:

For data that are not in a "well behaved" format and require actual
parsing, I tend to use Python scripts for transmogrifying the data
into nice and tidy tables (and maybe some very basic filtering). For
everything after that I prefer R. I also use Python if the relevant
data needs to be harvested and assembled from many differnt sources
(e.g. data files + web + databases).

Once the data files are easy to read (csv, tab separated, database,
...) and the task is to reshape, filter and clean the data, I usually
do it in R. R has true advantages here: 

 - After reading a table into a data frame I can immediatly tell, if all
   measurements are what they are supposed to be (integer, numeric,
   factor, boolean) and functions like read.table even do quite some
   error checking for me (equal number of columns etc.)

 - Finding out if factors have the right (or plausible) number of levels is easy
 - Filtering by logical indexing

 - Powerful and reliable reshaping (reshape package)

 - Very conveniant diagnostics: str(), dim(), table(), summary(),
   plotting the data in various ways, ...


Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany

More information about the R-help mailing list