[R] Excluding all teh columns from a data frame if the standard deviation of that column is zero(0).

R. Michael Weylandt michael.weylandt at gmail.com
Tue Oct 16 12:24:41 CEST 2012


On Tue, Oct 16, 2012 at 9:08 AM, siddu479 <onlyfordigitalstuff at gmail.com> wrote:
> Hi All,
>
>   I have a data frame where nearly 10K columns of data, where most of them
> have standard deviation( of all rows) as zero.
> I want to exclude all the columns from the data frame and proceed to further
> processing.
>
> I tried like blow.
> *data <- read.csv("data.CSV", header=T)
>
> for(i in 2:ncol(data))
>  if(sd(data[,i])==0){
>  df[,i] <-NULL
> }
> *
> where I have the data columns from 2:ncol, but getting the error "Error in
> df[, i] <- NULL : object of type 'closure' is not subsettable"
>
> Can any one suggest the right method to accomplish this.
>

A perfect example of why "df" is a bad function name. Here you are
getting the function ( = closure, more or less) df, density function
of the F distribution, instead of the uninitialized variable "df".
Since the function can't be subsetted, you get the error.

In fact, I think you really just want this one liner:

!(apply(data, 2, sd) == 0)

which can be used to subset.

In the same vein as the df problem, data is also a bad function name
(it's also a pre-defined function used for loading, surprise
surprise!, data) but R is smart enough to keep them straight in this
simple example. In your real script, however, I'd strongly suggest you
change it.

Cheers,
Michael




More information about the R-help mailing list