[R] how to get "lsmeans"?

Thu Mar 22 15:17:58 CET 2007

I believe there is an easier way and you don't have to do a thing.

A student of mine just purchased a brand new laptop loaded with Vista. 
Guess what?  SPSS v14 or v15 does not work with vista.  In a couple of
days we will install ubuntu on it and R.

Enjoy,

Ioannis

============
On Πεμ, Μάρτιος 22, 2007 15:35, Muenchen, Robert A (Bob) wrote:
> Hi All,
>
> Perhaps I'm stating the obvious, but to increase the use of R in places
> where SAS & SPSS dominate, it's important to make getting the same
> answers as easy as possible. That includes things like lsmeans and type
> III sums of squares. I've read lots of discussions here on sums of
> squares & I'm not advocating type III use, just looking at it from a
> marketing perspective. Too many people look for excuses to not change.
> The fewer excuses, the better.
>
> Of course this is easy for me to say, as I'm not the one who does the
> work! Much thanks to those who do.
>
> Cheers,
> Bob
>
> =========================================================
>   Bob Muenchen (pronounced Min'-chen), Manager
>   Statistical Consulting Center
>   U of TN Office of Information Technology
>   200 Stokely Management Center, Knoxville, TN 37996-0520
>   Voice: (865) 974-5230
>   FAX:   (865) 974-4810
>   Email: muenchen at utk.edu
>   Web:   http://oit.utk.edu/scc,
>   News:  http://listserv.utk.edu/archives/statnews.html
> =========================================================
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
>> bounces at stat.math.ethz.ch] On Behalf Of John Fox
>> Sent: Wednesday, March 21, 2007 8:59 PM
>> To: 'Prof Brian Ripley'
>> Cc: 'r-help'; 'Chuck Cleland'
>> Subject: Re: [R] how to get "lsmeans"?
>>
>> Dear Brian et al.,
>>
>> My apologies for chiming in late: It's been a busy day.
>>
>> First some general comments on "least-squares means" and "effect
>> displays."
>> The general idea behind the two is similar -- to examine fitted values
>> corresponding to a term in a model while holding other terms to
> typical
>> values -- but the implementation is not identical. There are also
> other
>> similar ideas floating around as well. My formulation is more general
>> in the
>> sense that it applies to a wider variety of models, both linear and
>> otherwise.
>>
>> "Least-squares means" (a horrible term, by the way: in a 1980 paper in
>> the
>> American Statistician, Searle, Speed, and Milliken suggested the more
>> descriptive term "population marginal means") apply to factors and
>> combinations of factors; covariates are set to mean values and the
>> levels of
>> other factors are averaged over, in effect applying equal weight to
>> each
>> level. (This is from memory, so it's possible that I'm not getting it
>> quite
>> right, but I believe that I am.) In my effect displays, each level of
> a
>> factor is weighted by its proportion in the data. In models in which
>> least-squares means can be computed, they should differ from the
>> corresponding effect display by a constant (if there are different
>> numbers
>> of observations in the different levels of the factors that are held
>> constant).
>>
>> The obstacle to computing either least-squares means or effect
> displays
>> in R
>> via predict() is that predict() wants factors in the "new data" to be
>> set to
>> particular levels. The effect() function in the effects package
>> bypasses
>> predict() and works directly with the model matrix, averaging over the
>> columns that pertain to a factor (and reconstructing interactions as
>> necessary). As mentioned, this has the effect of setting the factor to
>> its
>> proportional distribution in the data. This approach also has the
>> advantage
>> of being invariant with respect to the choice of contrasts for a
>> factor.
>>
>> The only convenient way that I can think of to implement least-squares
>> means
>> in R would be to use deviation-coded regressors for a factor (that is,
>> contr.sum) and then to set the columns of the model matrix for the
>> factor(s)
>> to be averaged over to 0. It may just be that I'm having a failure of
>> imagination and that there's a better way to proceed. I've not
>> implemented
>> this solution because it is dependent upon the choice of contrasts and
>> because I don't see a general advantage to it, but since the issue has
>> come
>> up several times now, maybe I should take a crack at it. Remember that
>> I
>> want this to work more generally, not just for levels of factors, and
>> not
>> just for linear models.
>>
>> Brian is quite right in mentioning that he suggested some time ago
> that
>> I
>> use critical values of t rather than of the standard normal
>> distribution for
>> producing confidence intervals, and I agree that it makes sense to do
>> so in
>> models in which the dispersion is estimated. My only excuse for not
> yet
>> doing this is that I want to undertake a more general revision of the
>> effects package, and haven't had time to do it. There are several
>> changes
>> that I'd like to make to the package. For example, I have results for
>> multinomial and proportional odds logit models (described in a paper
> by
>> me
>> and Bob Andersen in the 2006 issue of Sociological Methodology) that I
>> want
>> to incorporate, and I'd like to improve the appearance of the default
>> graphs. But Brian's suggestion is very straightforward, and I guess
>> that I
>> shouldn't wait to implement it; I'll do so very soon.
>>
>> Regards,
>>  John
>>
>> --------------------------------
>> John Fox
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario
>> Canada L8S 4M4
>> 905-525-9140x23604
>> http://socserv.mcmaster.ca/jfox
>> --------------------------------
>>
>> > -----Original Message-----
>> > From: r-help-bounces at stat.math.ethz.ch
>> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof
>> > Brian Ripley
>> > Sent: Wednesday, March 21, 2007 12:03 PM
>> > To: Chuck Cleland
>> > Cc: r-help
>> > Subject: Re: [R] how to get "lsmeans"?
>> >
>> > On Wed, 21 Mar 2007, Chuck Cleland wrote:
>> >
>> > > Liaw, Andy wrote:
>> > >> I verified the result from the following with output from JMP 6
> on
>> > >> the same data (don't have SAS: don't need it):
>> > >>
>> > >> set.seed(631)
>> > >> n <- 100
>> > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n,
>> > replace=TRUE)),
>> > >>                   B=factor(sample(1:2, n, replace=TRUE)),
>> > >>                   C=factor(sample(1:2, n, replace=TRUE)),
>> > >>                   d=rnorm(n))
>> > >> fm <- lm(y ~ A + B + C + d, dat)
>> > >> ## Form a data frame of points to predict: all
>> > combinations of the ##
>> > >> three factors and the mean of the covariate.
>> > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <-
> lapply(p,
>> > >> factor) p <- cbind(p, d=mean(dat$d)) p <-
>> > cbind(yhat=predict(fm, p),
>> > >> p) ## lsmeans for the three factors:
>> > >> with(p, tapply(yhat, A, mean))
>> > >> with(p, tapply(yhat, B, mean))
>> > >> with(p, tapply(yhat, C, mean))
>> > >
>> > >  Using Andy's example data, these are the LSMEANS and
>> > intervals I get
>> > > from SAS:
>> > >
>> > > A        y LSMEAN      95% Confidence Limits
>> > > 1       -0.071847       -0.387507     0.243813
>> > > 2       -0.029621       -0.342358     0.283117
>> > >
>> > > B        y LSMEAN      95% Confidence Limits
>> > > 1       -0.104859       -0.397935     0.188216
>> > > 2        0.003391       -0.333476     0.340258
>> > >
>> > > C        y LSMEAN      95% Confidence Limits
>> > > 1       -0.084679       -0.392343     0.222986
>> > > 2       -0.016789       -0.336374     0.302795
>> > >
>> > >  One way of reproducing the LSMEANS and intervals from SAS using
>> > > predict() seems to be the following:
>> > >
>> > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
>> > data = dat)
>> > >> newdat <- expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
>> > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>> > >  A   B   C          d         fit        lwr       upr
>> > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
>> > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
>> > >
>> > >  However, another possibility seems to be:
>> > >
>> > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
>> > data = dat)
>> > >> newdat <-
>> > >
>> >
>> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer
>> > > ic(dat$C)),d=mean(dat$d))
>> > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>> > >  A    B    C          d         fit        lwr       upr
>> > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
>> > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
>> > >
>> > >  The predictions directly above match what effect() in the effects
>> > > package by John Fox returns:
>> > >
>> > > library(effects)
>> > >
>> > >> effect("A", fm, xlevels=list(d = mean(dat$D)))
>> > >
>> > > A effect
>> > > A
>> > >          1           2
>> > > -0.08078243 -0.03855619
>> > >
>> > >  But for some reason the predict() and effect() intervals
>> > are a little
>> > > different:
>> > >
>> > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
>> > >          [,1]
>> > > 101 -0.3924451
>> > > 102 -0.3440179
>> > >
>> > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
>> > >         [,1]
>> > > 101 0.2308802
>> > > 102 0.2669055
>> > >
>> > >  I would be interested in any comments on these different
>> > approaches
>> > > and on the difference in intervals returned by predict()
>> > and effect().
>> >
>> > AFAIR, the effects packages uses normal-based confidence
>> > intervals and predict.lm uses t-based ones, and I have
>> > suggested to John Fox that t-based intervals would be
>> > preferable, at least as an option.
>> >
>> >
>> > --
>> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> > University of Oxford,             Tel:  +44 1865 272861 (self)
>> > 1 South Parks Road,                     +44 1865 272866 (PA)
>> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ioannis C. Dimakos, Ph.D.
University of Patras
Department of Elementary Education
Patras, GR-26500 GREECE
http://www.elemedu.upatras.gr/dimakos/
http://thedailyblahblah.blogspot.com/