[R] Are least-squares means useful or appropriate?

Sat Sep 24 15:04:14 CEST 2005

Dear Peter, Doug, and Felipe,

My effects package (on CRAN, also see the article at
http://www.jstatsoft.org/counter.php?id=75&url=v08/i15/effect-displays-revis
ed.pdf) will compute and graph adjusted effects of various kinds for linear
and generalized linear models -- generalizing so-called "least-squares
means" (or "population marginal means" or "adjusted means").

A couple of comments: 

By default, the all.effects() function in the effects package computes
effects for high-order terms in the model, absorbing terms marginal to them.
You can ask the effect() function to compute an effect for a term that's
marginal to a higher-order term, and it will do so with a warning, but this
is rarely sensible.

Peter's mention of floating variances (or quasi-variances) in this context
is interesting, but what would most like to see, I think, are the
quasi-variances for the adjusted effects, that is for terms merged with
their lower-order relatives. These, for example, are unaffected by contrast
coding. How to define reasonable quasi-variances in this context has been
puzzling me for a while.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard
> Sent: Friday, September 23, 2005 10:23 AM
> To: Douglas Bates
> Cc: Felipe; R-help at stat.math.ethz.ch
> Subject: Re: [R] Are least-squares means useful or appropriate?
> 
> Douglas Bates <dmbates at gmail.com> writes:
> 
> > On 9/20/05, Felipe <felipe at unileon.es> wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > Hi.
> > > My question was just theoric. I was wondering if someone who were 
> > > using SAS and R could give me their opinion on the topic. I was 
> > > trying to use least-squares means for comparison in R, but then I 
> > > found some indications against them, and I wanted to know if they 
> > > had good basis (as I told earlier, they were not much detailed).
> > > Greetings.
> > >
> > > Felipe
> > 
> > As Deepayan said in his reply, the concept of least squares 
> means is 
> > associated with SAS and is not generally part of the theory 
> of linear 
> > models in statistics.  My vague understanding of these (I 
> too am not a 
> > SAS user) is that they are an attempt to estimate the 
> "mean" response 
> > for a particular level of a factor in a model in which that 
> factor has 
> > a non-ignorable interaction with another factor.  There is 
> no clearly 
> > acceptable definition of such a thing.
> 
> (PD goes and fetches the SAS manual....)
> 
> Well, yes. it'll do that too, although only if you ask for 
> the lsmeans of A when an interaction like A*B is present in 
> the model. This is related to the tests of main effects when 
> an interaction is present using type III sums of squares, 
> which has been beaten to death repeatedly on the list. In 
> both cases, there seems to be an implicit assumption that 
> categorical variables by nature comes from an underlying 
> fully balanced design.
> 
> If the interaction is absent from the model, the lsmeans are 
> somewhat more sensible in that they at least reproduce the 
> parameter estimates as contrasts between different groups. 
> All continuous variables in the design will be set to their 
> mean, but values for categorical design variables are 
> weighted inversely as the number of groups. So if you're 
> doing an lsmeans of lung function by smoking adjusted for age 
> and sex you get estimates for the mean of a population of 
> which everyone has the same age and half are male and half 
> are female. This makes some sense, but if you do it for sex 
> adjusting for smoking and age, you are not only forcing the 
> sexes to smoke equally much, but actually adjusting to  
> smoking rates of 50%, which could be quite far from reality. 
> 
> The whole operation really seems to revolve around 2 things: 
> 
> (1) pairwise comparisons between factor levels. This can alternatively
>     be done fairly easily using parameter estimates for the relevant
>     variable and associated covariances. You don't really need all the
>     mumbo-jumbo of adjusting to particular values of other variables.
> 
> (2) plotting effects of a factor with error bars as if they were
>     simple group means. This has some merit since the standard
>     parametrizations are misleading at times (e.g. if you choose the
>     group with the least data as the reference level, std. err. for
>     the other groups will seem high). However, it seems to me that
>     concepts like floating variances (see float() in the Epi package)
>     are more to the point.
> 
> > R is an interactive language where it is a simple matter to fit a 
> > series of models and base your analysis on a model that is 
> > appropriate.  An approach of "give me the answer to any possible 
> > question about this model, whether or not it make sense" is 
> > unnecessary.
> > 
> > In many ways statistical theory and practice has not caught up with 
> > statistical computing.  There are concepts that are 
> regarded as part 
> > of established statistical theory when they are, in fact, 
> > approximations or compromises motivated by the fact that you can't 
> > compute the answer you want - except now you can compute 
> it.  However, 
> > that won't stop people who were trained in the old system from 
> > assuming that things *must* be done in that way.
> > 
> > In short, I agree with Deepayan - the best thing to do is to ask 
> > someone who uses SAS and least squares means to explain to you what 
> > they are.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: 
> (+45) 35327907
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html