[Rd] loess returns different standard errors for identical models (PR#7956)

Sat Jun 18 10:57:27 CEST 2005

I've seen many similar things in a report from valgrind.  But they went 
away when compiled without optimization: it seems optimization often does 
a fetch one element off the end of an array when attempting to keep the 
pipelines full.

I'd start by re-running the valgrind tests without optimization.

On Sat, 18 Jun 2005, Peter Dalgaard wrote:

> btyner at stat.purdue.edu writes:
>
>> Full_Name: Benjamin Tyner
>> Version: 2.1.0, 4/18/2005
>> OS: i686-redhat-linux-gnu
>> Submission from: (NULL) (4.64.8.220)
>>
>>
>> # Just run my.test() below in a newly opened R session. Once too many models
>> have been fit (~20 on my system), the computed standard error jumps to a
>> different value. This is (superficially) due to a different residual sum of
>> squares, not a different one.delta. No other aspect of the fit is affected, just
>> the computed value of s (I've run extensive testing with all.equal() to make
>> sure). Issuing a garbage collection before doing a loess fit appears to "solve"
>> the problem, which makes me think this is not a problem in loessc.c or loessf.f.
>> My point is that a few loess fits in one session should not cause the estimated
>> standard error computation go awry with no warning.
>
> Right. Valgrind has this to say:
>
>> my.test()
> ==22986== Use of uninitialised value of size 8
> ==22986==    at 0x1C97051B: lowesb_ (loessf.f:1542)
> ==22986==    by 0x1C95B399: loess_raw (loessc.c:98)
> ==22986==    by 0x809C9AE: do_dotCode (dotcode.c:1709)
> ==22986==    by 0x80B368F: Rf_eval (eval.c:405)
> [1] "s =  0.857141235910414"
> [1] "s =  0.857141235910414"
>
> and that certainly fits the pattern.
>
> Unfortunately this seems to be in the call to ehg31() in this passage
>
>      end if
>      setlf=(iv(27).ne.iv(25))
>      call ehg131(xx,yy,ww,trl,diagl,iv(20),iv(29),iv(3),iv(2),iv(5),
>     +     iv(17),iv(4),iv(6),iv(14),iv(19),wv(1),iv(iv(7)),iv(iv(8)),
>     +     iv(iv(9)),iv(iv(10)),iv(iv(22)),iv(iv(27)),wv(iv(11)),
>     +     iv(iv(23)),wv(iv(13)),wv(iv(12)),wv(iv(15)),wv(iv(16)),
>     +     wv(iv(18)),ifloor(iv(3)*wv(2)),wv(3),wv(iv(26)),wv(iv(24)),
>     +     wv(4),iv(30),iv(33),iv(32),iv(41),iv(iv(25)),wv(iv(34)),
>     +     setlf)
>      if(iv(14).lt.iv(6)+DBLE(iv(4))/2.D0)then
>         call ehg183('k-d tree limited by memory; nvmax=',
>     +        iv(14),1,1)
>
> (line numbers in optimized code are somewhat unreliable), so there are
> quite a few items to check. Dumping out the iv and wv arrays at that
> point is probably a good start if you want to chip in with a bit
> of debugging. Do yourself a favour and use set.seed() with a value
> that gives you a minimal repeat count when you start R in a clean state.
>
> -- 
>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595