[R] drop columns whose rows are all 0

Rolf Turner rolf.turner at xtra.co.nz
Tue Jan 24 22:01:13 CET 2012


On 25/01/12 05:14, Francisco wrote:
> Hello,
> I have a dataset with 40 variables, some of them are always 0 (each 
> row). I would like to make a subset containing only the columns which 
> values are not all 0, but I don't know how to do it.
>
> I tried:
>
> for(cut_column in 1:40) {
>
> if(sum(dataset[,cut_column])!=0) {
>                 columns_useful<-c(columns_useful,dataset[cut_column])
>
> }
> }
>
> sorted_dataset<-subset(dataset, select=columns_useful)
>
> But it doesn't work.

Try:

     good_dataset <- dataset[,sapply(dataset,function(x){!all(x==0)})]

This works modulo possible gotchas induced by floating point arithmetic.

Another possibility:

     tol <- sqrt(.Machine$double.eps)
     good_dataset <- 
dataset[,sapply(dataset,function(x){!all(abs(x)<=tol)})]

Or:

     good_dataset <- 
dataset[,sapply(dataset,function(x){!isTRUE(all.equal(x,rep(0,length(x))))})]

The foregoing could trip up if some columns of "dataset" have extra 
attributes tagging
along.  E.g. the column could actually be a numeric matrix of zeroes --- 
in which case
it wouldn't get dropped.

     cheers,

         Rolf Turner



More information about the R-help mailing list