[Rd] bug (PR#13570)

Ryan Hafen rhafen at stat.purdue.edu
Fri Mar 6 02:29:36 CET 2009


Excellent, Ben!  Thanks!!


On Mar 5, 2009, at 8:24 PM, Benjamin Tyner wrote:

> Hi
>
> Nice to hear from you Ryan. I also do not have the capability to  
> debug on windows; however, there is a chance that the behavior you  
> are seeing is caused by the following bug noted in my thesis  
> (available on ProQuest; email me if you don't have access):
>
> "When lambda = 0 there are no local slopes to aid the blending  
> algorithm, yet the
> interpolator would still assume they were available, and thus use  
> arbitrary values
> from memory. This had implications for both fit and tr[L]  
> computation. In the
> updated code these are set equal to zero which seems the best  
> automatic rule when
> lambda = 0." [lambda refers to degree]
>
> I submitted a bug fix to Eric Grosse, the maintainer of the netlib  
> routines; the fixed lines of fortran are identified in the comments  
> at (just search for my email address):
>
> http://www.netlib.org/a/loess
>
> These fixes would be relatively simple to incorporate into R's  
> version of loessf.f
>
> Alternatively, a quick check would be for someone to compile the  
> source package at https://centauri.stat.purdue.edu:98/loess/loess_0.4-1.tar.gz 
>  and test it on windows. Though this package incorporates this and a  
> few other fixes, please be aware that it the routines are converted  
> to C and thus there is a slight performance hit compared to the  
> fortran.
>
> Hope this helps,
> Ben
>
> Ryan Hafen wrote:
>> That is true - good point.
>>
>> lp1 <- predict(loess(y ~ x, degree=0))
>> lp2 <- predict(loess(y ~ x, degree=0,  
>> control=loess.control(surface="direct")))
>> sort(abs(lp1-lp2))
>>
>> It appears that the interpolating fit is correct at the vertices.   
>> I know when degree>=1, the interpolation uses the slopes of the  
>> local fits to get a better approximation.  Perhaps it's still  
>> trying to do this with degree=0 but the slopes aren't available.   
>> And we have just been lucky in the past with uninitialized values?   
>> If this is the problem it would probably be very simple to fix and  
>> I'd love to see degree=0 stay.  I will see if I can figure it out.
>>
>>
>> On Mar 5, 2009, at 6:01 PM, Greg Snow wrote:
>>
>>> I see the same problem on Windows XP.
>>>
>>> But if I run loess with surface='direct' then the results are  
>>> correct.  So it looks like the problem comes from the smoothing/ 
>>> interpolating, not the main loess algorithm.
>>>
>>> -- 
>>> Gregory (Greg) L. Snow Ph.D.
>>> Statistical Data Center
>>> Intermountain Healthcare
>>> greg.snow at imail.org
>>> 801.408.8111
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
>>>> project.org] On Behalf Of Ryan Hafen
>>>> Sent: Thursday, March 05, 2009 7:43 AM
>>>> To: Prof Brian Ripley
>>>> Cc: Uwe Ligges; Berwin A Turlach; r-devel at stat.math.ethz.ch; Peter
>>>> Dalgaard
>>>> Subject: Re: [Rd] bug (PR#13570)
>>>>
>>>>
>>>> On Mar 5, 2009, at 7:59 AM, Prof Brian Ripley wrote:
>>>>
>>>>> On Thu, 5 Mar 2009, Peter Dalgaard wrote:
>>>>>
>>>>>> Prof Brian Ripley wrote:
>>>>>>> Undortunately the example is random, so not really reproducible
>>>>>>> (and I
>>>>>>> see nothing wrong on my Mac). However, Linux valgrind on R- 
>>>>>>> devel is
>>>>>>> showing a problem:
>>>>>>>
>>>>>>> ==3973== Conditional jump or move depends on uninitialised  
>>>>>>> value(s)
>>>>>>> ==3973==    at 0xD76017B: ehg141_ (loessf.f:532)
>>>>>>> ==3973==    by 0xD761600: lowesa_ (loessf.f:769)
>>>>>>> ==3973==    by 0xD736E47: loess_raw (loessc.c:117)
>>>>>>>
>>>>>>> (The uninitiialized value is in someone else's code and I  
>>>>>>> suspect
>>>>>>> it was
>>>>>>> either never intended to work or never tested.)  No essential
>>>>>>> change has
>>>>>>> been made to the loess code for many years.
>>>>>>>
>>>>>>> I would not have read the documentation to say that degree = 0  
>>>>>>> was
>>>> a
>>>>>>> reasonable value. It is not to my mind 'a polynomial surface',  
>>>>>>> and
>>>>>>> loess() is described as a 'local regression' for degree 1 or 2  
>>>>>>> in
>>>>>>> the
>>>>>>> reference.  So unless anyone wants to bury their heads in that
>>>>>>> code I
>>>>>>> think a perfectly adequate fix would be to disallow degree = 0.
>>>>>>> (I vaguely recall debating allowing in the code ca 10 years  
>>>>>>> ago.)
>>>>>>
>>>>>> The code itself has
>>>>>>
>>>>>> if (!match(degree, 0:2, 0))
>>>>>>     stop("'degree' must be 0, 1 or 2")
>>>>>>
>>>>>> though. "Local fitting of a constant" essentially becomes kernel
>>>>>> smoothing, right?
>>>>>
>>>>> I do know the R code allows it: the question is whether it is  
>>>>> worth
>>>>> the effort of finding the problem(s) in the underlying c/dloess
>>>>> code, whose manual (and our reference) is entirely about 1 or  
>>>>> 2.  I
>>>>> am concerned that there may be other things lurking in the  
>>>>> degree=0
>>>>> case if it was never tested (in the netlib version: I am sure it  
>>>>> was
>>>>> only minmally tested through my R interface).
>>>>>
>>>>> I checked the original documentation on netlib and that says
>>>>>
>>>>> 29      DIM     dimension of local regression
>>>>>              1               constant
>>>>>              d+1             linear   (default)
>>>>>              (d+2)(d+1)/2    quadratic
>>>>>              Modified by ehg127 if cdeg<tdeg.
>>>>>
>>>>> which seems to confirm that degree = 0 was intended to be allowed,
>>>>> and what I dimly recall from ca 1998 is debating whether the R  
>>>>> code
>>>>> should allow that or not.
>>>>>
>>>>> If left to me I would say I did not wish to continue to support
>>>>> degree = 0.
>>>>
>>>> True.  There are plenty of reasons why one wouldn't want to use
>>>> degree=0 anyway.  And I'm sure there are plenty of other simple  
>>>> ways
>>>> to achieve the same effect.
>>>>
>>>> I ran into the problem because some code I'm planning on  
>>>> distributing
>>>> as part of a paper submission "blends" partway down to degree 0
>>>> smoothing at the endpoints to reduce the variance.  The only bad
>>>> effect of disallowing degree 0 is for anyone with code depending on
>>>> it, although there are probably few that use it and better to  
>>>> disallow
>>>> than to give an incorrect computation.  I got around the problem by
>>>> installing a modified loess by one of Cleveland's former students:
>>>> https://centauri.stat.purdue.edu:98/loess/
>>>> (but don't want to require others who use my code to do so as  
>>>> well).
>>>>
>>>> What is very strange to me is that it has been working fine in
>>>> previous R versions (tested on 2.7.1 and 2.6.1) and nothing has
>>>> changed in the loess source but yet it is having problems on 2.8.1.
>>>> Would this suggest it not being a problem with the netlib code?
>>>>
>>>> Also strange that it reportedly works on Linux but not on Mac or
>>>> Windows.  On the mac, the effect was much smaller. With windows, it
>>>> was predicting values like 2e215 whereas on the mac, you would  
>>>> almost
>>>> believe the results were legitimate if you didn't think about the  
>>>> fact
>>>> that a weighted moving average involving half the data shouldn't
>>>> oscillate so much.
>>>>
>>>> If the consensus is to keep degree=0, I'd be happy to help try to  
>>>> find
>>>> the problem or provide a test case or something.  Thanks for  
>>>> looking
>>>> into this.
>>>>
>>>> Ryan
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>> On Thu, 5 Mar 2009, Uwe Ligges wrote:
>>>>>>>
>>>>>>>> Berwin A Turlach wrote:
>>>>>>>>> G'day Peter,
>>>>>>>>>
>>>>>>>>> On Thu, 05 Mar 2009 09:09:27 +0100
>>>>>>>>> Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
>>>>>>>>>
>>>>>>>>>> rhafen at stat.purdue.edu wrote:
>>>>>>>>>>> <<insert bug report here>>
>>>>>>>>>>>
>>>>>>>>>>> This is a CRITICAL bug!!!  I have verified it in R 2.8.1 for
>>>> mac
>>>>>>>>>>> and for windows.  The problem is with loess degree=0  
>>>>>>>>>>> smoothing.
>>>>>>>>>>> For example, try the following:
>>>>>>>>>>>
>>>>>>>>>>> x <- 1:100
>>>>>>>>>>> y <- rnorm(100)
>>>>>>>>>>> plot(x, y)
>>>>>>>>>>> lines(predict(loess(y ~ x, degree=0, span=0.5)))
>>>>>>>>>>>
>>>>>>>>>>> This is obviously wrong.
>>>>>>>>>> Obvious? How? I don't see anything particularly odd (on  
>>>>>>>>>> Linux).
>>>>>>>>>
>>>>>>>>> Neither did I on linux; but the OP mentioned mac and  
>>>>>>>>> windows. On
>>>>>>>>> windows, on running that code, the lines() command added a  
>>>>>>>>> lot of
>>>>>>>>> vertical lines; most spanning the complete window but some  
>>>>>>>>> only
>>>>>>>>> part.
>>>>>>>>> Executing the code a second time (or in steps) gave sensible
>>>>>>>>> results. My guess would be that some memory is not correctly
>>>>>>>>> allocated or
>>>>>>>>> initialised.  Or is it something like an object with storage  
>>>>>>>>> mode
>>>>>>>>> "integer" being passed to a double?  But then, why doesn't it
>>>>>>>>> show on
>>>>>>>>> linux?
>>>>>>>>>
>>>>>>>>> Happy bug hunting.  If my guess is correct, then I have no  
>>>>>>>>> idea
>>>>>>>>> how to
>>>>>>>>> track down such things under windows.....
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>  Berwin
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>>
>>>>>>>> Please can you folks try under R-devel (to be R-2.9.0 in a  
>>>>>>>> couple
>>>>>>>> of
>>>>>>>> weeks) and report if you still see it. I do not under R-devel
>>>>>>>> (but do
>>>>>>>> under R-release), so my guess is that something called by  
>>>>>>>> loess()
>>>>>>>> has
>>>>>>>> been fixed in the meantime.
>>>>>>>>
>>>>>>>> Moreover it is not the plot stuff that was wrong under R-2.8.1
>>>>>>>> (release) but the loess computations.
>>>>>>>>
>>>>>>>> Uwe Ligges
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>>>>> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
>>>>>> 35327918
>>>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45)
>>>>>> 35327907
>>>>>>
>>>>>>
>>>>>
>>>>> -- 
>>>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>



More information about the R-devel mailing list