[R] eliminating constant variables

jim holtman jholtman at gmail.com
Sun Jul 11 01:30:59 CEST 2010


You can remove NAs with:

train <- subset(train, !is.na(TargetVariable))

I am not sure what you mean by constant values.  You could use 'table'
to determine which values appear the most and then remove them:

x <- table(train$TargetVariable)
train <- subset(train, !(TargetVariable %in% names(x)[x >
someCountAboveWhichToDelete]))

But you probably need to look at your data and determine which numbers
are in the set that you need to delete.

On Sat, Jul 10, 2010 at 6:28 PM, pdb <philb at philbrierley.com> wrote:
>
> Hi all,
>
> I have a large data set and want to immediately build a 'blind' model
> without first examining the data. Now it appears in the data there are a lot
> of fields that are constant or all missing values - which prevents the model
> from being built.
>
> Can someone point me the right direction as to how I can automatically purge
> my data file of these useless fields.
>
> Thanks in advance,
>
> pdb
>
> train <- read.csv("TrainingData.csv")
> library(gbm)
> i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli.....
>
> 1: In gbm.fit(x, y, offset = offset, distribution = distribution,  ... :
>  variable 5: var1 has no variation.
> --
> View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list