[R] Linear modelling confusion.

Rolf Turner r.turner at auckland.ac.nz
Thu Aug 30 02:44:43 CEST 2007


I'm puzzled by a phenomenon that I have just encountered, and was  
hoping that someone
out there might be able to give me some insight.  It's not an R  
problem as such, but .....
well, there are R aspects.

I have a rather messy data set, with a response, a fixed effect, and  
3 random effects:
school, class, and student.  The classes are nested within schools;  
the students are
nested within schools; students are *not* nested within classes.  The  
fixed effect
is ``time'', with 6 levels.  There are 1428 observations.

The ``design'' (the data are from an observational study) is vastly  
imbalanced; there
are brazillions of empty cells.

I tried fitting two models:

	(1) y ~ time + school + class%in%school + student%in%school

	(2) y ~ time + cls.in.scl + std.in.scl

where I formed the factors ``cls.in.scl'' and ``std.in.scl'' by using  
the interaction()
function:

	cls.in.scl <- interaction(class,school,sep=".in.")
	std.in.scl <- interaction(student,school,sep=".in.")

This was one way that I could think of to fit the model with the  
school factor dropped out
(given that the levels of  ``class'' and ``student'' were nested  
within the levels of ``school'').

Now my puzzlement:  The model matrix for (2) is singular; that for  
(1) is not.

I can believe that the imbalance/existence of empty cells could  
easily make the model
singular --- but since students and classes are nested within  
schools, I don't see how
including a school factor could eliminate the singularity.  I have  
been trying to concoct
a low dimensional toy example of this phenomenon, and am not able to  
do so.

Can anyone provide me with some enlightenment?

I might remark that I have tried fitting the same two models, to the  
same data, in
Minitab, and got the same sort of result --- model (1) ran to  
completion; model (2) came
to a shuddering halt with a complaint about ``rank deficiency ... no  
further analysis will
be done''.

I mention this because it convinces me that the problem is not with  
my syntax for expressing
the models --- I have never really managed to grok R's linear  
modelling syntax, but I feel
comfortable/confident with Minitab's syntax.  Minitab would write  
model (1) as

	(1) y ~ time + school + class(school) + student(school)

While I'm on this topic --- a totally side-issue:  I fitted the two  
models with lm(), i.e. completely
ignoring the fact that school, class, and student are random effects  
(whence any correct
inference from the fits is impossible).  That's by-the-by, since my  
real concern at the moment
is the singularity issue.

But can any kind soul please tell me how I might *properly* fit the  
model in R?  What package
and function should I use, and what (very explicitly, in  
monosyllables, for I am a bear of very
little brain and long words bother me) is the correct syntax for  
expressing the models for the
aforesaid function?

Yes, I have looked at lme4 and lmer(), and I have (tried to) read  
Pinheiro and Bates, but I
couldn't get it.  As I said, I am a bear of very little brain ....

Thanks for taking the time to read this long-winded cri de coeur.  If  
anyone is actually
*interested* I would be very happy to make the data available to her  
or him.

			cheers,

				Rolf Turner


######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}



More information about the R-help mailing list