[R] Beyond reshape: automatically streamlining data

Marshall Feldman marsh at uri.edu
Fri Apr 9 14:59:02 CEST 2010


Hello,

I've been very impressed by the reshape package and how easy it makes 
reorganizing statistical data structures. This makes me wonder if 
there's another package out there that addresses another set of tasks 
that one often does when preparing data for analysis.

For any particular set of analyses, one typically recodes variables and 
deletes cases and variables. It would be really nice to have a package 
that, for example, if one selected cases from a larger data set based on 
the values of certain variables would inspect the resulting data and 
drop any variables that have the same value for all cases. Similarly, if 
any cases are entirely zero or NA, the package could (under user 
control) drop these cases. Finally, it could take a set of data 
transformations and keep them as an object, so that the same 
selection/reshape/streamlining can easily be applied to similar data sets.

My motivation for this came from working with employment data this 
morning. I started out with 11 variables and 35569 cases for Rhode 
Island, a few selections later I had only 420 cases and 3 variables. It 
struck me that the process I went through, which included not only 
making selections but also inspecting the results and deleting 
unnecessary cases/variables, could be automated at least to eliminate 
the inspection step. Also, since I want to do the same thing with data 
for other states, automation would be very nice indeed.

I realize that programming this kind of stuff in R is relatively easy, 
but the reshape package makes me wonder if someone has already done it.

Thanks
     Marsh Feldman



More information about the R-help mailing list