[R] Data handling/optimum glm method.

Thu Mar 29 15:28:17 CEST 2012

Ben:

On Thu, Mar 29, 2012 at 5:41 AM, Ben Bolker <bbolker at gmail.com> wrote:
>  <abigailclifton <at> me.com> writes:
>
>
>> I am trying to fit a generalised linear model to some loan
>> application and default data. The purpose of this is to eventually
>> work out the probability an applicant will default.
>
>> However, R seems to crash or die when I run "glm" on anything
>>  greater than a 5-way saturated model for my data.
>
>  What does "crash or die" mean?  Are you getting error messages?
> What are they? Is the R application actually quitting?
>
>> My first question: is the best way to fit a generalised linear model
>> in R to fit the saturated model and extract the significant terms
>> only, or to start at the null model and to work up to the optimum
>> one?
>
>  This is more of a statistical practice question than an R question.
> Opinions differ
Well, to clarify: I do not think opinions differ on the first proposal
 -- reduce model to only significant terms. This should **not** be
done.

I also would say (more tentatively) that modern practice rejects the
notion of an "optimum" model to begin with,preferring shrinkage of
other methodology.

Cheers,
Bert

 but in general I would say if it is computationally
> feasible that you should start (and maybe finish) with the
> full model.
>
>> I am importing a csv file with 3500 rows and 27 columns (3500x27 matrix).
>
>> My second question: is there anyway to increase the memory
>> I have so R can cope with more analysis?
>
>   help("Memory-limits")
>>
>> I can send my code if it would help to answer the question.
>
>  Please read the posting guide (link at the bottom of every R-help
> posting) and follow its advice.  We don't know enough about your
> situation to help.  You could also try reading
> http://tinyurl.com/reproducible-000 ...
>
>  This works for me:
>
> z <- matrix(rnorm(3500*27),ncol=27)
> y <- sample(0:1,replace=TRUE,size=3500)
> colnames(z) <- c(letters,"A")
> d <- data.frame(y=y,z)
> gg <- glm(y~.,data=d,family="binomial")
> gg <- glm(y~a*b*c*d*e*f*g*h,data=d,family="binomial")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm