[R] How to improve, at all, a simple GLM code
abigailclifton at me.com
Thu Mar 29 21:48:34 CEST 2012
I am trying to fit a logit model to some data in a CSV file in R.
Here is my code:
Prepared_Data = read.csv("Prepared_Data.csv", header=TRUE)
lrfit<-glm(C3~A1*B2*D4*E5,family = binomial)
write.csv(anova(lrfit, test="Chisq"), file="CWModelA.csv")
I am unsure as to how many methods there are of choosing a suitable model, however, I was hoping to fit the full/saturated model and choose the significant terms only as my final model.
My first question therefore: is there a better way to fit a model to some data? Is there a function or way of getting R to print the optimum model?
My CSV file, when opened in excel, contains approximately 3500 rows x 27 columns. I can only seem to run 'anova()' on the saturated/full model including the first four columns/factors. If I take any more into consideration (e.g. if I did C3~A1*B2*D4*E5*F6*G7), R stops responding/I have to force quit. Why is this? How can I get around it as I need to include all 27 columns?
Any advice or constructive criticism is appreciated - even if it means I have to start from scratch.
More information about the R-help