[R] jitter-bug? problematic behaviour of the jitter function

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Sep 24 19:03:41 CEST 2020


Folks: Please note:

There is *no* way to "jitter" the 3 values 1,2, and 1e5 so that:

a) the jittered values differ from the original ones by a fraction of their
original value;
b) the plotting symbols for the jittered values will be distinguishable on
a linear scale holding all 3 values.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Sep 24, 2020 at 8:39 AM Martin Keller-Ressel <
martin.keller-ressel using tu-dresden.de> wrote:

> Dear Duncan, Dear Rui,
>
> thanks for the responses and for pointing out that it is the ‚fuzz‘ part
> that is causing the problem. I agree that this is not a bug, but could be
> undesirable/surprising behaviour, since it causes a large ‚discontinuity‘
> in the jitter functions output depending on the input data.
>
> I was (ab?)using the jitter function to break ties, where the desired
> behaviour would be to add noise just small enough to make all values
> unique. (Such a function can easily be hand coded of course.)
>
> best regards,
> Martin
>
> Am 23.09.2020 um 22:25 schrieb Duncan Murdoch <murdoch.duncan using gmail.com
> <mailto:murdoch.duncan using gmail.com>>:
>
> On 23/09/2020 4:03 p.m., Rui Barradas wrote:
> Hello,
> I believe that though Duncan's explanation is right it is also not
> explaining the value of the digits argument. round makes the first 2
> numbers 0 but why?
>
> If there had been rounding in their computation, you might see a
> difference like 1e-15.  You wouldn't want to use that for the scale of
> jittering, so some rounding is needed.
>
> I think the documentation for the function is poor, but the intention was
> probably to use the function in graphics (as the references did), and in
> that case, any values too close together should be treated as equal and
> jittering should separate them.  The particular computation used says that
> if the range is in [1, 10), values equal to 3 decimal places will be too
> close and need separation.
>
> So I don't think this is a bug, but it might be a valid wishlist item:
> document what "apart from fuzz" means, and perhaps allow it to be
> controlled by the user.
>
> Duncan Murdoch
>
>
>
> The function below prints the digits argument and
> then outputs d. The code is taken from jitter.
> f <- function(x){
>    z <- diff(r <- range(x[is.finite(x)]))
>    cat("digits:", 3 - floor(log10(z)), "\n")
>    diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
> }
> Now see what cat outputs for 'digits'.
> f(c(1,2,10^4))  # desired behaviour
> #digits: 0
> #[1]    1 9998
> f(c(0,1,10^4))  # bad behaviour
> #digits: -1
> #[1] 10000
> f(c(-1,0,10^4))  # bad behaviour
> #digits: -1
> #[1] 10000
> f(c(1,2,10^5))  # bad behaviour
> #digits: -1
> #[1] 1e+05
> And according to the documentation of ?round, negative digits are allowed:
> Rounding to a negative number of digits means rounding to a power of
> ten, so for example round(x, digits = -2) rounds to the nearest hundred.
> But in this case two of the numbers are closer to 0 than they are of 10.
> And unique keeps only 0 and the largest, then diff is big.
> round(c(1,2,10^4),0)  # desired behaviour
> #[1]     1     2 10000
> round(c(0,1,10^4),-1)  # bad behaviour
> #[1]     0     0 10000
> round(c(-1,0,10^4),-1)  # bad behaviour
> #[1]     0     0 10000
> round(c(1,2,10^5),-1)  # bad behaviour
> #[1] 0e+00 0e+00 1e+05
> Isn't it still a bug?
> Rui Barradas
> Às 15:57 de 23/09/20, Duncan Murdoch escreveu:
> On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
> Dear all,
>
> i have noticed some strange behaviour in the „jitter“ function in R.
> On the help page for jitter it is stated that
>
> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x)
> and a is the amount argument (if specified).“
>
> and
>
> "If amount is NULL (default), we set a <- factor * d/5 where d is the
> smallest difference between adjacent unique (apart from fuzz) x values.“
>
> This works fine as long as there is no (very) large outlier
>
> jitter(c(1,2,10^4))  # desired behaviour
> [1]    1.083243    1.851571 9999.942716
>
> But for very large outliers the added noise suddenly ‚jumps‘ to a much
> larger scale:
>
> jitter(c(1,2,10^5)) # bad behaviour
> [1] -19535.649   9578.702 115693.854
> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
>
> This probably does not matter much when jitter is used for plotting,
> but it can cause problems when jitter is used to break ties.
>
> I think this is kind of documented:  "apart from fuzz" is what counts.
> If you look at the code for jitter, you'll see this important line:
>
>   d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
>
> By the time you get here, z is the length of the rante of the data, so
> it's 99999 in your example.  The rounding changes your values to
> 0,0,1e5, so the smallest difference is 1e5.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list