[R] how to get "lsmeans"?

Thu Mar 22 15:24:09 CET 2007

Dear Bob,

I think that you make a valid point -- and I've included "Type-III" tests in
the Anova() function in the car package, for example, though not entirely
for consistency with SAS -- but my object in writing the effects package was
different. I wanted to provide a general approach to visualizing terms in
complex models with linear predictors. If I can with reasonable effort
provide a solution that's the same as "least-squares means" for models for
which "least-squares means" are defined then I'll do so at some point, but
duplicating what SAS does was not my goal.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Muenchen, Robert A (Bob)
> Sent: Thursday, March 22, 2007 9:36 AM
> To: R-help at stat.math.ethz.ch
> Subject: Re: [R] how to get "lsmeans"?
> 
> Hi All,
> 
> Perhaps I'm stating the obvious, but to increase the use of R 
> in places where SAS & SPSS dominate, it's important to make 
> getting the same answers as easy as possible. That includes 
> things like lsmeans and type III sums of squares. I've read 
> lots of discussions here on sums of squares & I'm not 
> advocating type III use, just looking at it from a marketing 
> perspective. Too many people look for excuses to not change.
> The fewer excuses, the better.
> 
> Of course this is easy for me to say, as I'm not the one who 
> does the work! Much thanks to those who do.
> 
> Cheers,
> Bob
> 
> =========================================================
>   Bob Muenchen (pronounced Min'-chen), Manager
>   Statistical Consulting Center
>   U of TN Office of Information Technology
>   200 Stokely Management Center, Knoxville, TN 37996-0520
>   Voice: (865) 974-5230  
>   FAX:   (865) 974-4810
>   Email: muenchen at utk.edu
>   Web:   http://oit.utk.edu/scc, 
>   News:  http://listserv.utk.edu/archives/statnews.html
> =========================================================
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- 
> > bounces at stat.math.ethz.ch] On Behalf Of John Fox
> > Sent: Wednesday, March 21, 2007 8:59 PM
> > To: 'Prof Brian Ripley'
> > Cc: 'r-help'; 'Chuck Cleland'
> > Subject: Re: [R] how to get "lsmeans"?
> > 
> > Dear Brian et al.,
> > 
> > My apologies for chiming in late: It's been a busy day.
> > 
> > First some general comments on "least-squares means" and "effect 
> > displays."
> > The general idea behind the two is similar -- to examine 
> fitted values 
> > corresponding to a term in a model while holding other terms to
> typical
> > values -- but the implementation is not identical. There are also
> other
> > similar ideas floating around as well. My formulation is 
> more general 
> > in the sense that it applies to a wider variety of models, 
> both linear 
> > and otherwise.
> > 
> > "Least-squares means" (a horrible term, by the way: in a 
> 1980 paper in 
> > the American Statistician, Searle, Speed, and Milliken 
> suggested the 
> > more descriptive term "population marginal means") apply to factors 
> > and combinations of factors; covariates are set to mean 
> values and the 
> > levels of other factors are averaged over, in effect applying equal 
> > weight to each level. (This is from memory, so it's 
> possible that I'm 
> > not getting it quite right, but I believe that I am.) In my effect 
> > displays, each level of
> a
> > factor is weighted by its proportion in the data. In models 
> in which 
> > least-squares means can be computed, they should differ from the 
> > corresponding effect display by a constant (if there are different 
> > numbers of observations in the different levels of the factors that 
> > are held constant).
> > 
> > The obstacle to computing either least-squares means or effect
> displays
> > in R
> > via predict() is that predict() wants factors in the "new 
> data" to be 
> > set to particular levels. The effect() function in the 
> effects package 
> > bypasses
> > predict() and works directly with the model matrix, 
> averaging over the 
> > columns that pertain to a factor (and reconstructing 
> interactions as 
> > necessary). As mentioned, this has the effect of setting 
> the factor to 
> > its proportional distribution in the data. This approach 
> also has the 
> > advantage of being invariant with respect to the choice of 
> contrasts 
> > for a factor.
> > 
> > The only convenient way that I can think of to implement 
> least-squares 
> > means in R would be to use deviation-coded regressors for a factor 
> > (that is,
> > contr.sum) and then to set the columns of the model matrix for the
> > factor(s)
> > to be averaged over to 0. It may just be that I'm having a 
> failure of 
> > imagination and that there's a better way to proceed. I've not 
> > implemented this solution because it is dependent upon the 
> choice of 
> > contrasts and because I don't see a general advantage to 
> it, but since 
> > the issue has come up several times now, maybe I should 
> take a crack 
> > at it. Remember that I want this to work more generally, 
> not just for 
> > levels of factors, and not just for linear models.
> > 
> > Brian is quite right in mentioning that he suggested some time ago
> that
> > I
> > use critical values of t rather than of the standard normal 
> > distribution for producing confidence intervals, and I 
> agree that it 
> > makes sense to do so in models in which the dispersion is 
> estimated. 
> > My only excuse for not
> yet
> > doing this is that I want to undertake a more general 
> revision of the 
> > effects package, and haven't had time to do it. There are several 
> > changes that I'd like to make to the package. For example, I have 
> > results for multinomial and proportional odds logit models 
> (described 
> > in a paper
> by
> > me
> > and Bob Andersen in the 2006 issue of Sociological 
> Methodology) that I 
> > want to incorporate, and I'd like to improve the appearance of the 
> > default graphs. But Brian's suggestion is very 
> straightforward, and I 
> > guess that I shouldn't wait to implement it; I'll do so very soon.
> > 
> > Regards,
> >  John
> > 
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> > 
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch 
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian 
> > > Ripley
> > > Sent: Wednesday, March 21, 2007 12:03 PM
> > > To: Chuck Cleland
> > > Cc: r-help
> > > Subject: Re: [R] how to get "lsmeans"?
> > >
> > > On Wed, 21 Mar 2007, Chuck Cleland wrote:
> > >
> > > > Liaw, Andy wrote:
> > > >> I verified the result from the following with output from JMP 6
> on
> > > >> the same data (don't have SAS: don't need it):
> > > >>
> > > >> set.seed(631)
> > > >> n <- 100
> > > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n,
> > > replace=TRUE)),
> > > >>                   B=factor(sample(1:2, n, replace=TRUE)),
> > > >>                   C=factor(sample(1:2, n, replace=TRUE)),
> > > >>                   d=rnorm(n))
> > > >> fm <- lm(y ~ A + B + C + d, dat)
> > > >> ## Form a data frame of points to predict: all
> > > combinations of the ##
> > > >> three factors and the mean of the covariate.
> > > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <-
> lapply(p,
> > > >> factor) p <- cbind(p, d=mean(dat$d)) p <-
> > > cbind(yhat=predict(fm, p),
> > > >> p) ## lsmeans for the three factors:
> > > >> with(p, tapply(yhat, A, mean))
> > > >> with(p, tapply(yhat, B, mean))
> > > >> with(p, tapply(yhat, C, mean))
> > > >
> > > >  Using Andy's example data, these are the LSMEANS and
> > > intervals I get
> > > > from SAS:
> > > >
> > > > A        y LSMEAN      95% Confidence Limits
> > > > 1       -0.071847       -0.387507     0.243813
> > > > 2       -0.029621       -0.342358     0.283117
> > > >
> > > > B        y LSMEAN      95% Confidence Limits
> > > > 1       -0.104859       -0.397935     0.188216
> > > > 2        0.003391       -0.333476     0.340258
> > > >
> > > > C        y LSMEAN      95% Confidence Limits
> > > > 1       -0.084679       -0.392343     0.222986
> > > > 2       -0.016789       -0.336374     0.302795
> > > >
> > > >  One way of reproducing the LSMEANS and intervals from SAS using
> > > > predict() seems to be the following:
> > > >
> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > data = dat)
> > > >> newdat <- 
> expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> > > >  A   B   C          d         fit        lwr       upr
> > > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> > > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
> > > >
> > > >  However, another possibility seems to be:
> > > >
> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
> > > data = dat)
> > > >> newdat <-
> > > >
> > >
> > 
> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer
> > > > ic(dat$C)),d=mean(dat$d))
> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
> > > >  A    B    C          d         fit        lwr       upr
> > > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> > > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
> > > >
> > > >  The predictions directly above match what effect() in 
> the effects 
> > > > package by John Fox returns:
> > > >
> > > > library(effects)
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))
> > > >
> > > > A effect
> > > > A
> > > >          1           2
> > > > -0.08078243 -0.03855619
> > > >
> > > >  But for some reason the predict() and effect() intervals
> > > are a little
> > > > different:
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
> > > >          [,1]
> > > > 101 -0.3924451
> > > > 102 -0.3440179
> > > >
> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
> > > >         [,1]
> > > > 101 0.2308802
> > > > 102 0.2669055
> > > >
> > > >  I would be interested in any comments on these different
> > > approaches
> > > > and on the difference in intervals returned by predict()
> > > and effect().
> > >
> > > AFAIR, the effects packages uses normal-based confidence 
> intervals 
> > > and predict.lm uses t-based ones, and I have suggested to 
> John Fox 
> > > that t-based intervals would be preferable, at least as an option.
> > >
> > >
> > > --
> > > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > > Professor of Applied Statistics,  
> http://www.stats.ox.ac.uk/~ripley/
> > > University of Oxford,             Tel:  +44 1865 272861 (self)
> > > 1 South Parks Road,                     +44 1865 272866 (PA)
> > > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting- 
> > guide.html and provide commented, minimal, self-contained, 
> > reproducible code.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.