[R] for help about logistic regression model

Douglas Bates bates at stat.wisc.edu
Tue Nov 21 21:45:40 CET 2006


On 11/21/06, Aimin Yan <aiminy at iastate.edu> wrote:
> thanks for your reply, it is very helpful.
> I have one more question.
> Now I try to fit a full mode use 13 predictors, but I get this error
> message. Dose this problem come from too many predictors or too large data set?
>   thanks,
>
> Aimin Yan
>
>
>  > p5.lgm.9 <- lmer(Y
> ~p*aa*index*x*y*z*sdx*sdy*sdz*delta*as*ms*cur+(1|p/aa),data=p5,family=binomial,control=list(usePQL=FALSE,msV=1))
> Error: cannot allocate vector of size 1565600 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 494Mb: see help(memory.size)
> 2: Reached total allocation of 494Mb: see help(memory.size)

Well, considering that the model you specified would have a 13-factor
interaction and 13 12-factor interactions and 78 11-factor
interactions and ... I think your problem is that you are trying to
estimate far too many fixed effects parameters.  There would be a
total of 2^13 terms in the model.  I didn't bother to calculate the
total number of coefficients because 2^13 is already greater than the
number of observations.

[Can anyone provide code to calculate the total number of
fixed-effects coefficients?  The structure of the data is
> str(p5)
'data.frame':	1030 obs. of  15 variables:
 $ p    : Factor w/ 5 levels "821p","8ABP",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ aa   : Factor w/ 19 levels "ALA","ARG","ASN",..: 12 16 7 18 11 10
19 19 19 1 ...
 $ index: int  1 2 3 4 5 6 7 8 9 11 ...
 $ x    : num  -5.10 -4.07 -5.87 -1.35 -4.27 ...
 $ y    : num  32.9 28.7 30.5 26.9 27.8 ...
 $ z    : num  -5.858 -4.838 -0.687 -0.492  6.273 ...
 $ sdx  : num  1.478 0.598 1.313 1.038 1.206 ...
 $ sdy  : num  1.74 1.38 2.00 1.37 1.20 ...
 $ sdz  : num  0.826 1.166 0.896 2.285 1.634 ...
 $ delta: num  13.8 13.7 22.8 44.7 53.3 ...
 $ as   : num  126.9  64.1  82.7   7.6  42.0 ...
 $ ms   : num  92.4 50.7 75.3 17.2 57.7 ...
 $ cur  : num  -0.1320 -0.0977 -0.0182  0.2368  0.1306 ...
 $ sc   : num  111.1  98.5  65.1  75.4  91.1 ...
 $ Y    : logi   TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
FALSE FALSE  TRUE ...
]

You may want to start with an additive model instead of a model with
all possible interactions.  Even better would be to plot the data in
various ways to try to see which of these covariates seems to have a
substantial effect on the probability of p5$Y being TRUE or FALSE.

Remember that when you are working with a binary response you get
exactly 1 bit of information from each observation of the response.
Because that isn't a whole lot of information per observation you need
to have a large number of observations relative to the number of
coefficients that you hope to estimate.



More information about the R-help mailing list