[R] Are least-squares means useful or appropriate?

John Fox jfox at mcmaster.ca
Sat Sep 24 15:04:14 CEST 2005

Dear Peter, Doug, and Felipe,

My effects package (on CRAN, also see the article at
ed.pdf) will compute and graph adjusted effects of various kinds for linear
and generalized linear models -- generalizing so-called "least-squares
means" (or "population marginal means" or "adjusted means").

A couple of comments: 

By default, the all.effects() function in the effects package computes
effects for high-order terms in the model, absorbing terms marginal to them.
You can ask the effect() function to compute an effect for a term that's
marginal to a higher-order term, and it will do so with a warning, but this
is rarely sensible.

Peter's mention of floating variances (or quasi-variances) in this context
is interesting, but what would most like to see, I think, are the
quasi-variances for the adjusted effects, that is for terms merged with
their lower-order relatives. These, for example, are unaffected by contrast
coding. How to define reasonable quasi-variances in this context has been
puzzling me for a while.


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard
> Sent: Friday, September 23, 2005 10:23 AM
> To: Douglas Bates
> Cc: Felipe; R-help at stat.math.ethz.ch
> Subject: Re: [R] Are least-squares means useful or appropriate?
> Douglas Bates <dmbates at gmail.com> writes:
> > On 9/20/05, Felipe <felipe at unileon.es> wrote:
> > > Hash: SHA1
> > >
> > > Hi.
> > > My question was just theoric. I was wondering if someone who were 
> > > using SAS and R could give me their opinion on the topic. I was 
> > > trying to use least-squares means for comparison in R, but then I 
> > > found some indications against them, and I wanted to know if they 
> > > had good basis (as I told earlier, they were not much detailed).
> > > Greetings.
> > >
> > > Felipe
> > 
> > As Deepayan said in his reply, the concept of least squares 
> means is 
> > associated with SAS and is not generally part of the theory 
> of linear 
> > models in statistics.  My vague understanding of these (I 
> too am not a 
> > SAS user) is that they are an attempt to estimate the 
> "mean" response 
> > for a particular level of a factor in a model in which that 
> factor has 
> > a non-ignorable interaction with another factor.  There is 
> no clearly 
> > acceptable definition of such a thing.
> (PD goes and fetches the SAS manual....)
> Well, yes. it'll do that too, although only if you ask for 
> the lsmeans of A when an interaction like A*B is present in 
> the model. This is related to the tests of main effects when 
> an interaction is present using type III sums of squares, 
> which has been beaten to death repeatedly on the list. In 
> both cases, there seems to be an implicit assumption that 
> categorical variables by nature comes from an underlying 
> fully balanced design.
> If the interaction is absent from the model, the lsmeans are 
> somewhat more sensible in that they at least reproduce the 
> parameter estimates as contrasts between different groups. 
> All continuous variables in the design will be set to their 
> mean, but values for categorical design variables are 
> weighted inversely as the number of groups. So if you're 
> doing an lsmeans of lung function by smoking adjusted for age 
> and sex you get estimates for the mean of a population of 
> which everyone has the same age and half are male and half 
> are female. This makes some sense, but if you do it for sex 
> adjusting for smoking and age, you are not only forcing the 
> sexes to smoke equally much, but actually adjusting to  
> smoking rates of 50%, which could be quite far from reality. 
> The whole operation really seems to revolve around 2 things: 
> (1) pairwise comparisons between factor levels. This can alternatively
>     be done fairly easily using parameter estimates for the relevant
>     variable and associated covariances. You don't really need all the
>     mumbo-jumbo of adjusting to particular values of other variables.
> (2) plotting effects of a factor with error bars as if they were
>     simple group means. This has some merit since the standard
>     parametrizations are misleading at times (e.g. if you choose the
>     group with the least data as the reference level, std. err. for
>     the other groups will seem high). However, it seems to me that
>     concepts like floating variances (see float() in the Epi package)
>     are more to the point.
> > R is an interactive language where it is a simple matter to fit a 
> > series of models and base your analysis on a model that is 
> > appropriate.  An approach of "give me the answer to any possible 
> > question about this model, whether or not it make sense" is 
> > unnecessary.
> > 
> > In many ways statistical theory and practice has not caught up with 
> > statistical computing.  There are concepts that are 
> regarded as part 
> > of established statistical theory when they are, in fact, 
> > approximations or compromises motivated by the fact that you can't 
> > compute the answer you want - except now you can compute 
> it.  However, 
> > that won't stop people who were trained in the old system from 
> > assuming that things *must* be done in that way.
> > 
> > In short, I agree with Deepayan - the best thing to do is to ask 
> > someone who uses SAS and least squares means to explain to you what 
> > they are.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: 
> (+45) 35327907
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

More information about the R-help mailing list