[R] how to get "lsmeans"?

Thu Mar 22 16:24:50 CET 2007

Dear John,     

As a green hand of R, I feel that if the results are identical to the ones of the user's familiar softwares such as
SAS, SPSS or Stata etc,the user will feel confident about the results of R. Why? Because lots of R beginners have no 
strong statistical background, just like me, they chose to trust SAS or SPSS firstly.

The best solution is that R should not only have the capability to produce the identical results of SAS or SPSS,
but also should have many more powerful, more prefessional functions(packages).
Then more and more guys who are not special at statistics will enjoy R, which makes R
popular; and more and more statisticians like it also, which makes R prefessional. 

Obviously, R is on the load to success.

Best wishes. 
yours, sincerely,  
Xingwang Ye    
PhD candidate     
Research Group of Nutrition Related Cancers and Other Chronic Diseases  
Institute for Nutritional Sciences,  
Shanghai Institutes of Biological Sciences,     
Chinese Academy of Sciences     

>Dear Bob, 
> 
>I think that you make a valid point -- and I've included "Type-III" tests in 
>the Anova() function in the car package, for example, though not entirely 
>for consistency with SAS -- but my object in writing the effects package was 
>different. I wanted to provide a general approach to visualizing terms in 
>complex models with linear predictors. If I can with reasonable effort 
>provide a solution that's the same as "least-squares means" for models for 
>which "least-squares means" are defined then I'll do so at some point, but 
>duplicating what SAS does was not my goal. 
> 
>Regards, 
> John 
> 
>-------------------------------- 
>John Fox 
>Department of Sociology 
>McMaster University 
>Hamilton, Ontario 
>Canada L8S 4M4 
>905-525-9140x23604 
>http://socserv.mcmaster.ca/jfox  
>--------------------------------  
> 
>> -----Original Message----- 
>> From: r-help-bounces at stat.math.ethz.ch  
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of  
>> Muenchen, Robert A (Bob) 
>> Sent: Thursday, March 22, 2007 9:36 AM 
>> To: R-help at stat.math.ethz.ch 
>> Subject: Re: [R] how to get "lsmeans"? 
>>  
>> Hi All, 
>>  
>> Perhaps I'm stating the obvious, but to increase the use of R  
>> in places where SAS & SPSS dominate, it's important to make  
>> getting the same answers as easy as possible. That includes  
>> things like lsmeans and type III sums of squares. I've read  
>> lots of discussions here on sums of squares & I'm not  
>> advocating type III use, just looking at it from a marketing  
>> perspective. Too many people look for excuses to not change. 
>> The fewer excuses, the better. 
>>  
>> Of course this is easy for me to say, as I'm not the one who  
>> does the work! Much thanks to those who do. 
>>  
>> Cheers, 
>> Bob 
>>  
>> ========================================================= 
>>   Bob Muenchen (pronounced Min'-chen), Manager 
>>   Statistical Consulting Center 
>>   U of TN Office of Information Technology 
>>   200 Stokely Management Center, Knoxville, TN 37996-0520 
>>   Voice: (865) 974-5230   
>>   FAX:   (865) 974-4810 
>>   Email: muenchen at utk.edu 
>>   Web:   http://oit.utk.edu/scc,  
>>   News:  http://listserv.utk.edu/archives/statnews.html 
>> ========================================================= 
>>  
>> > -----Original Message----- 
>> > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-  
>> > bounces at stat.math.ethz.ch] On Behalf Of John Fox 
>> > Sent: Wednesday, March 21, 2007 8:59 PM 
>> > To: 'Prof Brian Ripley' 
>> > Cc: 'r-help'; 'Chuck Cleland' 
>> > Subject: Re: [R] how to get "lsmeans"? 
>> >  
>> > Dear Brian et al., 
>> >  
>> > My apologies for chiming in late: It's been a busy day. 
>> >  
>> > First some general comments on "least-squares means" and "effect  
>> > displays." 
>> > The general idea behind the two is similar -- to examine  
>> fitted values  
>> > corresponding to a term in a model while holding other terms to 
>> typical 
>> > values -- but the implementation is not identical. There are also 
>> other 
>> > similar ideas floating around as well. My formulation is  
>> more general  
>> > in the sense that it applies to a wider variety of models,  
>> both linear  
>> > and otherwise. 
>> >  
>> > "Least-squares means" (a horrible term, by the way: in a  
>> 1980 paper in  
>> > the American Statistician, Searle, Speed, and Milliken  
>> suggested the  
>> > more descriptive term "population marginal means") apply to factors  
>> > and combinations of factors; covariates are set to mean  
>> values and the  
>> > levels of other factors are averaged over, in effect applying equal  
>> > weight to each level. (This is from memory, so it's  
>> possible that I'm  
>> > not getting it quite right, but I believe that I am.) In my effect  
>> > displays, each level of 
>> a 
>> > factor is weighted by its proportion in the data. In models  
>> in which  
>> > least-squares means can be computed, they should differ from the  
>> > corresponding effect display by a constant (if there are different  
>> > numbers of observations in the different levels of the factors that  
>> > are held constant). 
>> >  
>> > The obstacle to computing either least-squares means or effect 
>> displays 
>> > in R 
>> > via predict() is that predict() wants factors in the "new  
>> data" to be  
>> > set to particular levels. The effect() function in the  
>> effects package  
>> > bypasses 
>> > predict() and works directly with the model matrix,  
>> averaging over the  
>> > columns that pertain to a factor (and reconstructing  
>> interactions as  
>> > necessary). As mentioned, this has the effect of setting  
>> the factor to  
>> > its proportional distribution in the data. This approach  
>> also has the  
>> > advantage of being invariant with respect to the choice of  
>> contrasts  
>> > for a factor. 
>> >  
>> > The only convenient way that I can think of to implement  
>> least-squares  
>> > means in R would be to use deviation-coded regressors for a factor  
>> > (that is, 
>> > contr.sum) and then to set the columns of the model matrix for the 
>> > factor(s) 
>> > to be averaged over to 0. It may just be that I'm having a  
>> failure of  
>> > imagination and that there's a better way to proceed. I've not  
>> > implemented this solution because it is dependent upon the  
>> choice of  
>> > contrasts and because I don't see a general advantage to  
>> it, but since  
>> > the issue has come up several times now, maybe I should  
>> take a crack  
>> > at it. Remember that I want this to work more generally,  
>> not just for  
>> > levels of factors, and not just for linear models. 
>> >  
>> > Brian is quite right in mentioning that he suggested some time ago 
>> that 
>> > I 
>> > use critical values of t rather than of the standard normal  
>> > distribution for producing confidence intervals, and I  
>> agree that it  
>> > makes sense to do so in models in which the dispersion is  
>> estimated.  
>> > My only excuse for not 
>> yet 
>> > doing this is that I want to undertake a more general  
>> revision of the  
>> > effects package, and haven't had time to do it. There are several  
>> > changes that I'd like to make to the package. For example, I have  
>> > results for multinomial and proportional odds logit models  
>> (described  
>> > in a paper 
>> by 
>> > me 
>> > and Bob Andersen in the 2006 issue of Sociological  
>> Methodology) that I  
>> > want to incorporate, and I'd like to improve the appearance of the  
>> > default graphs. But Brian's suggestion is very  
>> straightforward, and I  
>> > guess that I shouldn't wait to implement it; I'll do so very soon. 
>> >  
>> > Regards, 
>> >  John 
>> >  
>> > -------------------------------- 
>> > John Fox 
>> > Department of Sociology 
>> > McMaster University 
>> > Hamilton, Ontario 
>> > Canada L8S 4M4 
>> > 905-525-9140x23604 
>> > http://socserv.mcmaster.ca/jfox 
>> > -------------------------------- 
>> >  
>> > > -----Original Message----- 
>> > > From: r-help-bounces at stat.math.ethz.ch  
>> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian  
>> > > Ripley 
>> > > Sent: Wednesday, March 21, 2007 12:03 PM 
>> > > To: Chuck Cleland 
>> > > Cc: r-help 
>> > > Subject: Re: [R] how to get "lsmeans"? 
>> > > 
>> > > On Wed, 21 Mar 2007, Chuck Cleland wrote: 
>> > > 
>> > > > Liaw, Andy wrote: 
>> > > >> I verified the result from the following with output from JMP 6 
>> on 
>> > > >> the same data (don't have SAS: don't need it): 
>> > > >> 
>> > > >> set.seed(631) 
>> > > >> n <- 100 
>> > > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n, 
>> > > replace=TRUE)), 
>> > > >>                   B=factor(sample(1:2, n, replace=TRUE)), 
>> > > >>                   C=factor(sample(1:2, n, replace=TRUE)), 
>> > > >>                   d=rnorm(n)) 
>> > > >> fm <- lm(y ~ A + B + C + d, dat) 
>> > > >> ## Form a data frame of points to predict: all 
>> > > combinations of the ## 
>> > > >> three factors and the mean of the covariate. 
>> > > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <- 
>> lapply(p, 
>> > > >> factor) p <- cbind(p, d=mean(dat$d)) p <- 
>> > > cbind(yhat=predict(fm, p), 
>> > > >> p) ## lsmeans for the three factors: 
>> > > >> with(p, tapply(yhat, A, mean)) 
>> > > >> with(p, tapply(yhat, B, mean)) 
>> > > >> with(p, tapply(yhat, C, mean)) 
>> > > > 
>> > > >  Using Andy's example data, these are the LSMEANS and 
>> > > intervals I get 
>> > > > from SAS: 
>> > > > 
>> > > > A        y LSMEAN      95% Confidence Limits 
>> > > > 1       -0.071847       -0.387507     0.243813 
>> > > > 2       -0.029621       -0.342358     0.283117 
>> > > > 
>> > > > B        y LSMEAN      95% Confidence Limits 
>> > > > 1       -0.104859       -0.397935     0.188216 
>> > > > 2        0.003391       -0.333476     0.340258 
>> > > > 
>> > > > C        y LSMEAN      95% Confidence Limits 
>> > > > 1       -0.084679       -0.392343     0.222986 
>> > > > 2       -0.016789       -0.336374     0.302795 
>> > > > 
>> > > >  One way of reproducing the LSMEANS and intervals from SAS using 
>> > > > predict() seems to be the following: 
>> > > > 
>> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, 
>> > > data = dat) 
>> > > >> newdat <-  
>> expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) 
>> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence")) 
>> > > >  A   B   C          d         fit        lwr       upr 
>> > > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 
>> > > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165 
>> > > > 
>> > > >  However, another possibility seems to be: 
>> > > > 
>> > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, 
>> > > data = dat) 
>> > > >> newdat <- 
>> > > > 
>> > > 
>> >  
>> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer 
>> > > > ic(dat$C)),d=mean(dat$d)) 
>> > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence")) 
>> > > >  A    B    C          d         fit        lwr       upr 
>> > > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012 
>> > > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465 
>> > > > 
>> > > >  The predictions directly above match what effect() in  
>> the effects  
>> > > > package by John Fox returns: 
>> > > > 
>> > > > library(effects) 
>> > > > 
>> > > >> effect("A", fm, xlevels=list(d = mean(dat$D))) 
>> > > > 
>> > > > A effect 
>> > > > A 
>> > > >          1           2 
>> > > > -0.08078243 -0.03855619 
>> > > > 
>> > > >  But for some reason the predict() and effect() intervals 
>> > > are a little 
>> > > > different: 
>> > > > 
>> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower 
>> > > >          [,1] 
>> > > > 101 -0.3924451 
>> > > > 102 -0.3440179 
>> > > > 
>> > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper 
>> > > >         [,1] 
>> > > > 101 0.2308802 
>> > > > 102 0.2669055 
>> > > > 
>> > > >  I would be interested in any comments on these different 
>> > > approaches 
>> > > > and on the difference in intervals returned by predict() 
>> > > and effect(). 
>> > > 
>> > > AFAIR, the effects packages uses normal-based confidence  
>> intervals  
>> > > and predict.lm uses t-based ones, and I have suggested to  
>> John Fox  
>> > > that t-based intervals would be preferable, at least as an option. 
>> > > 
>> > > 
>> > > -- 
>> > > Brian D. Ripley,                  ripley at stats.ox.ac.uk 
>> > > Professor of Applied Statistics,   
>> http://www.stats.ox.ac.uk/~ripley/ 
>> > > University of Oxford,             Tel:  +44 1865 272861 (self) 
>> > > 1 South Parks Road,                     +44 1865 272866 (PA) 
>> > > Oxford OX1 3TG, UK                Fax:  +44 1865 272595 
>> > > 
>> > > ______________________________________________ 
>> > > R-help at stat.math.ethz.ch mailing list  
>> > > https://stat.ethz.ch/mailman/listinfo/r-help 
>> > > PLEASE do read the posting guide 
>> > > http://www.R-project.org/posting-guide.html 
>> > > and provide commented, minimal, self-contained, reproducible code. 
>> >  
>> > ______________________________________________ 
>> > R-help at stat.math.ethz.ch mailing list 
>> > https://stat.ethz.ch/mailman/listinfo/r-help 
>> > PLEASE do read the posting guide http://www.R-project.org/posting-  
>> > guide.html and provide commented, minimal, self-contained,  
>> > reproducible code. 
>>  
>> ______________________________________________ 
>> R-help at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> PLEASE do read the posting guide  
>> http://www.R-project.org/posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code. 
> 
>______________________________________________ 
>R-help at stat.math.ethz.ch mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help 
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
>and provide commented, minimal, self-contained, reproducible code. 
>