[R] Difference between gam() and loess().

Sat Mar 21 01:31:27 CET 2009

Ravi Varadhan wrote:
> Good try, Kevin.  But that doesn't seem to do it. 
> 
> set.seed(123)
> 
> x <- sort(runif(100))
> 
> y <- sin(4*pi*x) + rnorm(100, sd=0.2)
> 
> ans.lo2 <- loess(y ~ x, degree=2, span=0.75)
> 
> ans.gam2 <- gam(y ~ lo(x, degree=2, span=0.75))
> 
> summary(ans.lo2$fitted - ans.gam2$fitted) # larger differences, about 10%
> 
> ans.lo1 <- loess(y ~ x, degree=1, span=0.75)
> 
> ans.gam1 <- gam(y ~ lo(x, degree=1, span=0.75))
> 
> summary(ans.lo1$fitted - ans.gam1$fitted) # smaller differences, about 2-5 percent
> 
> I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail.  I looked at the Fortran codes from both loess and gam.  They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever.  But one thing is clear - they are using different Fortran codes.
> 
> So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out.  
> 
> But, before that:  why is this an issue, Rolf?  Is it important that these two results be identical?
> 
> Best,
> Ravi.
> 

There was one other thing I found that I shared with Rolf off-list.
In loess.control() there is an iterations argument which is related
to the robustness of the estimates.  I would think that could also
account for tail departures especially.

I don't gave the gam package installed, so can't test these myself
at the moment.

> 
> 
> ----- Original Message -----
> From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca>
> Date: Thursday, March 19, 2009 8:23 pm
> Subject: Re: [R] Difference between gam() and loess().
> To: Rolf Turner <r.turner at auckland.ac.nz>
> Cc: R-help Forum <r-help at r-project.org>
> 
> 
>> Rolf Turner wrote:
>>  > 
>>  > It seems that in general
>>  > 
>>  >     gam(y~lo(x)) # gam() from the gam package.
>>  > 
>>  > and
>>  >     loess(y~x)
>>  > 
>>  > give slightly different results (in respect of the predicted/fitted 
>>
>>  > values).
>>  > Most noticeable at the endpoints of the range of x.
>>  > 
>>  > Can anyone enlighten me about the reason for this difference?
>>  > 
>>  > Is it possible to twiddle the control parameters, for either or 
>> both 
>>  > functions,
>>  > so as to obtain identical results?
>>  
>>  There are two obvious differences in the defaults.  In lo() from the 
>> gam 
>>  package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2.
>>  
>>  Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps.
>>  
>>  Kevin

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057