[R] How to improve, at all, a simple GLM code

Abigail Clifton abigailclifton at me.com
Thu Mar 29 21:48:34 CEST 2012


Hi There,

I am trying to fit a logit model to some data in a CSV file in R.
Here is my code:

Prepared_Data = read.csv("Prepared_Data.csv", header=TRUE)
Prepared_Data
attach(Prepared_Data)
lrfit<-glm(C3~A1*B2*D4*E5,family = binomial)
anova(lrfit, test="Chisq")
write.csv(anova(lrfit, test="Chisq"), file="CWModelA.csv")
shell.exec("CWModelA.csv")

I am unsure as to how many methods there are of choosing a suitable model, however, I was hoping to fit the full/saturated model and choose the significant terms only as my final model.
My first question therefore: is there a better way to fit a model to some data? Is there a function or way of getting R to print the optimum model?

My CSV file, when opened in excel, contains approximately 3500 rows x 27 columns. I can only seem to run 'anova()' on the saturated/full model including the first four columns/factors. If I take any more into consideration (e.g. if I did C3~A1*B2*D4*E5*F6*G7), R stops responding/I have to force quit. Why is this? How can I get around it as I need to include all 27 columns?

Any advice or constructive criticism is appreciated - even if it means I have to start from scratch.

Many Thanks,

AJC


More information about the R-help mailing list