[R] variable selections to avoid multicollinearity

Kristi Glover kristi.glover at hotmail.com
Sun May 17 22:06:25 CEST 2015


HI R user, 
I was trying to reduce my independent variables before I run models. I have a dependent variable as a present or TRUE only (no Absence or False) whereas I have more than 20 independent variables but they are highly correlated. I was trying to reduce the independent variables . I found  PCA for feature  selection are used. 
but for the PCA feature selection, I realized that it used dependent variable (as a linear model) with independent variables to select the variables based on variation explained. But, for me , the dependent data are only "1". Therefore, I could not run it. 

would you give me some suggestions on how I reduce the variables into a certain numbers ? I have attached a sample data. In this data set, the dependent variable is "sp" and other 20 variables are the independent variables

dat<-structure(list(sp = c(1L, 1L, 1L, 1L, 1L), var1 = c(32L, 222L, 
134L, 114L, 121L), var2 = c(188L, 175L, 167L, 166L, 167L), var3 = c(123L, 
129L, 136L, 138L, 137L), var4 = c(40L, 35L, 37L, 38L, 37L), var5 = c(6756L, 
8080L, 7856L, 7899L, 7891L), var6 = c(334L, 352L, 341L, 340L, 
341L), var7 = c(29L, -9L, -18L, -22L, -20L), var8 = c(305L, 361L, 
359L, 362L, 361L), var9 = c(108L, 217L, 167L, 166L, 166L), var10 = c(237L, 
67L, 61L, 59L, 60L), var11 = c(270L, 276L, 265L, 264L, 264L), 
    var12 = c(97L, 67L, 61L, 59L, 60L), var13 = c(1491L, 916L, 
    1245L, 1282L, 1250L), var14 = c(168L, 127L, 154L, 155L, 154L
    ), var15 = c(99L, 43L, 67L, 70L, 68L), var16 = c(15L, 32L, 
    22L, 21L, 21L), var17 = c(432L, 313L, 390L, 400L, 392L), 
    var18 = c(308L, 148L, 254L, 269L, 257L), var19 = c(332L, 
    213L, 269L, 277L, 271L), var20 = c(430L, 148L, 254L, 269L, 
    257L)), .Names = c("sp", "var1", "var2", "var3", "var4", 
"var5", "var6", "var7", "var8", "var9", "var10", "var11", "var12", 
"var13", "var14", "var15", "var16", "var17", "var18", "var19", 
"var20"), class = "data.frame", row.names = c(NA, -5L))

thanks 

 		 	   		  
	[[alternative HTML version deleted]]



More information about the R-help mailing list