[R] Bestglm subset analysis

D Wolf doug45290 at yahoo.com
Wed Jun 29 20:24:50 CEST 2016


Hello All,
I am working on a linear regression model and trying to find the best subset of variables for my dataset. I have 21 predictors, 1 response variable, and 79 observations. I need to find the best 5 or 6 predictors for my model. I've used leaps for lm() and I'm now trying bestglm for glm(). I'm following this webpage, which gives the code below. https://rstudio-pubs-static.s3.amazonaws.com/2897_9220b21cfc0c43a396ff9abf122bb351.html
My code:library(bestglm)library(base)lbw.for.bestglm <- within(df_Chl, {y <- df_Chl$Chloro })res.bestglm <- bestglm(Xy = lbw.for.bestglm, family = gaussian, IC = "AIC", method = "exhaustive")
# get coefficientsres.bestglm$BestModelsHere is a sample of my results (I removed the 5th through 21st predictors for brevity).> res.bestglm$BestModels    R21   R31   R32   R41 1 FALSE FALSE FALSE FALSE  2 FALSE  TRUE FALSE FALSE  3 FALSE FALSE FALSE FALSE 4 FALSE  TRUE FALSE FALSE 5 FALSE  TRUE FALSE FALSE  Criterion1  326.73272  326.95253  327.06594  327.09125  327.8208
Is it correct to assume I should keep variables that are TRUE from 1 through 5? What do those five rows represent? 
I know the AIC criterion result should be as low as possible. Is it possible to discern a good result for any of the IC criterion results, such as AIC, LOOCV, BICg, etc..? If BIC returns lower Criterion results, does that mean I need to use the BIC subset instead of the subset from AIC?
Thank You,
Doug

	[[alternative HTML version deleted]]



More information about the R-help mailing list