[R] approxfun-problems (yleft and yright ignored)
murdoch.duncan at gmail.com
Sat Sep 11 16:32:38 CEST 2010
On 11/09/2010 10:04 AM, Martin Maechler wrote:
>>>>>> "SW" == Samuel Wuest <wuests at tcd.ie>
>>>>>> on Thu, 26 Aug 2010 14:34:26 +0100 writes:
> SW> Hi Greg,
> SW> thanks for the suggestion:
> SW> I have attached some small dataset that can be used to reproduce the
> SW> odd behavior of the approxfun-function.
> SW> If it gets stripped off my email, it can also be downloaded at:
> SW> http://bioinf.gen.tcd.ie/approx.data.Rdata
> SW> Strangely, the problem seems specific to the data structure in my
> SW> expression set, when I use simulated data, everything worked fine.
> SW> Here is some code that I run and resulted in the strange output that I
> SW> have described in my initial post:
> >> ### load the data: a list called approx.data
> >> load(file="approx.data.Rdata")
> >> ### contains the slots "x", "y", "input"
> >> names(approx.data)
> SW>  "x" "y" "input"
> >> ### with y ranging between 0 and 1
> >> range(approx.data$y)
> SW>  0 1
> >> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
> >> range(approx.data$x)
> SW>  3.098444 7.268812
> >> range(approx.data$input)
> SW>  3.329408 13.026700
> >> ### generate the interpolation function (warning message benign)
> >> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
> SW> Warning message:
> SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, :
> SW> collapsing to unique 'x' values
> >> ### apply to input-values
> >> y.out <- sapply(approx.data$input, interp)
> >> ### still I find output values >1, even though yleft=1:
> >> range(y.out)
> SW>  0.000000 7.207233
> I get completely different (and correct) results,
> by the way the *same* you have in the bug report you've
> and which does *not* show any bug:
>  0.0000000 0.9816907
> Of course, I do believe that you've seen the above problems,
> -- on 64-bit Mac ? as you report in sessionInfo() ? --
> but I cannot reproduce them.
> And also, you seem yourself to be able to get different results
> for the same data... what are the circumstances?
I think Greg Snow's analysis (linked from the bug report) is right.
length(unique(x)) can be different than length(levels(as.factor(x)))
(which is what tapply uses to determine the number of levels).
I've fixed it by replacing
y <- as.vector(tapply(y,x,ties))
y <- as.vector(tapply(y,match(x,x),ties))
since match() uses the same rules as unique.
An argument could be made that tapply should be changed instead, but it
is behaving as documented, so I didn't want to do that.
More information about the R-help