[R] Beyond reshape: automatically streamlining data

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Apr 9 17:20:29 CEST 2010


Hi Marshall,

On Fri, Apr 9, 2010 at 8:59 AM, Marshall Feldman <marsh at uri.edu> wrote:
> ...
> For any particular set of analyses, one typically recodes variables and
> deletes cases and variables. It would be really nice to have a package that,
> for example, if one selected cases from a larger data set based on the
> values of certain variables would inspect the resulting data and drop any
> variables that have the same value for all cases. Similarly, if any cases
> are entirely zero or NA, the package could (under user control) drop these
> cases. Finally, it could take a set of data transformations and keep them as
> an object, so that the same selection/reshape/streamlining can easily be
> applied to similar data sets.
> ...

Some of the utilities in the caret package might be related to the
things your after:
http://cran.r-project.org/package=caret

There is a writeup about using caret to build predictive models in R
in the Journal of Statistical Software (it's a PDF):
http://www.jstatsoft.org/v28/i05/paper

I'd recommend reading through that if you haven't before, since caret
offers many handy wrapper/utility functions, but check out section 3:
Data Preparation, in particular, where Max talks about
zero-variance-predictors and the multicollinearity problem.

Hope that helps.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list