[Rd] predict.loess() segfaults for large n?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Mar 6 16:20:23 CET 2013
Thanks.
This is in the netlib loess code: the size is used in Fortran (and an
INTEGER) so we cannot increase it. I've added a test and thrown an
error if the dimension is too large.
On 01/03/2013 11:27, Hiroyuki Kawakatsu wrote:
> Hi,
>
> I am segfaulting when using predict.loess() (checked with r62092).
> I've traced the source with the help of valgrind (output pasted
> below) and it appears that this is due to int overflow when
> allocating an int work array in loess_workspace():
>
> liv = 50 + ((int)pow((double)2, (double)D) + 4) * nvmax + 2 * N;
>
> where liv is an (global) int. For D=1 (one x variable), this
> overflows at approx N = 4089 where N is the fitted sample size (not
> prediction sample size).
>
> I am aware that you are in the process of introducing long vectors
> but a quick fix would be to error when predict.loess(..., se=TRUE)
> and N is too large. (Ideally, one would use long int but does
> fortran portably support long int?) The threshold N value may depend
> on surface type (above is for surface=="interpolate").
>
> The following sample code does not result in segfault but when run
> with valgrind, it produces the warning about large range. (In the
> code that segfaults N is about 77,000).
>
> set.seed(1)
> n = 5000 # n=4000 seems ok
> x = rnorm(n)
> y = x + rnorm(n)
> yf = loess(y~x, span=0.75, control=loess.control(trace.hat="approximate"))
> print( predict(yf, data.frame(x=1), se=TRUE) )
>
> ##---valgrid output with segfault (abridged):
>
>> test4()
> ==30841== Warning: set address range perms: large range [0x3962a040,
> 0x5fb42608) (defined)
> ==30841== Warning: set address range perms: large range [0x5fb43040,
> 0xf8c8e130) (defined)
> ==30841== Invalid write of size 4
> ==30841== at 0xCD719F0: ehg139_ (loessf.f:1444)
> ==30841== by 0xCD72E0C: ehg131_ (loessf.f:467)
> ==30841== by 0xCD73A5A: lowesb_ (loessf.f:1530)
> ==30841== by 0xCD2C774: loess_ise (loessc.c:219)
> ==30841== by 0x486C7F: do_dotCode (dotcode.c:1744)
> ==30841== by 0x4AB040: bcEval (eval.c:4544)
> ==30841== by 0x4B6B3F: Rf_eval (eval.c:498)
> ==30841== by 0x4BAD87: Rf_applyClosure (eval.c:960)
> ==30841== by 0x4B6D5E: Rf_eval (eval.c:611)
> ==30841== by 0x4B7A1E: do_eval (eval.c:2193)
> ==30841== by 0x4AB040: bcEval (eval.c:4544)
> ==30841== by 0x4B6B3F: Rf_eval (eval.c:498)
> ==30841== Address 0xf8cd4144 is not stack'd, malloc'd or (recently)
> free'd
> ==30841==
>
> *** caught segfault ***
> address 0xf8cd4144, cause 'memory not mapped'
>
> Traceback:
> 1: predLoess(y, x, newx, s, weights, pars$robust, pars$span,
> pars$degree, pars$normalize, pars$parametric, pars$drop.square,
> pars$surface, pars$cell, pars$family, kd, divisor, se = se)
> 2: eval(expr, envir, enclos)
> 3: eval(substitute(expr), data, enclos = parent.frame())
> 4: with.default(object, predLoess(y, x, newx, s, weights,
> pars$robust, pars$span, pars$degree, pars$normalize,
> pars$parametric, pars$drop.square, pars$surface, pars$cell,
> pars$family, kd, divisor, se = se))
> 5: with(object, predLoess(y, x, newx, s, weights, pars$robust,
> pars$span, pars$degree, pars$normalize, pars$parametric,
> pars$drop.square, pars$surface, pars$cell, pars$family, kd,
> divisor, se = se))
> 6: predict.loess(y2, data.frame(hours = xmin), se = TRUE)
> 7: predict(y2, data.frame(hours = xmin), se = TRUE)
> 8: test4()
> aborting ...
> ==30841==
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list