[Rd] loess returns different standard errors for identical models (PR#7956)

Sat Jun 18 10:27:31 CEST 2005

btyner at stat.purdue.edu writes:

> Full_Name: Benjamin Tyner
> Version: 2.1.0, 4/18/2005
> OS: i686-redhat-linux-gnu
> Submission from: (NULL) (4.64.8.220)
> 
> 
> # Just run my.test() below in a newly opened R session. Once too many models
> have been fit (~20 on my system), the computed standard error jumps to a
> different value. This is (superficially) due to a different residual sum of
> squares, not a different one.delta. No other aspect of the fit is affected, just
> the computed value of s (I've run extensive testing with all.equal() to make
> sure). Issuing a garbage collection before doing a loess fit appears to "solve"
> the problem, which makes me think this is not a problem in loessc.c or loessf.f.
> My point is that a few loess fits in one session should not cause the estimated
> standard error computation go awry with no warning.

Right. Valgrind has this to say:

> my.test()
==22986== Use of uninitialised value of size 8
==22986==    at 0x1C97051B: lowesb_ (loessf.f:1542)
==22986==    by 0x1C95B399: loess_raw (loessc.c:98)
==22986==    by 0x809C9AE: do_dotCode (dotcode.c:1709)
==22986==    by 0x80B368F: Rf_eval (eval.c:405)
[1] "s =  0.857141235910414"
[1] "s =  0.857141235910414"

and that certainly fits the pattern.

Unfortunately this seems to be in the call to ehg31() in this passage

      end if
      setlf=(iv(27).ne.iv(25))
      call ehg131(xx,yy,ww,trl,diagl,iv(20),iv(29),iv(3),iv(2),iv(5),
     +     iv(17),iv(4),iv(6),iv(14),iv(19),wv(1),iv(iv(7)),iv(iv(8)),
     +     iv(iv(9)),iv(iv(10)),iv(iv(22)),iv(iv(27)),wv(iv(11)),
     +     iv(iv(23)),wv(iv(13)),wv(iv(12)),wv(iv(15)),wv(iv(16)),
     +     wv(iv(18)),ifloor(iv(3)*wv(2)),wv(3),wv(iv(26)),wv(iv(24)),
     +     wv(4),iv(30),iv(33),iv(32),iv(41),iv(iv(25)),wv(iv(34)),
     +     setlf)
      if(iv(14).lt.iv(6)+DBLE(iv(4))/2.D0)then
         call ehg183('k-d tree limited by memory; nvmax=',
     +        iv(14),1,1)

(line numbers in optimized code are somewhat unreliable), so there are
quite a few items to check. Dumping out the iv and wv arrays at that
point is probably a good start if you want to chip in with a bit
of debugging. Do yourself a favour and use set.seed() with a value
that gives you a minimal repeat count when you start R in a clean state.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907