[Rd] Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`

Evgeni Chasnovski evgeni@ch@@nov@ki @ending from gm@il@com
Fri Jan 4 10:59:54 CET 2019


This is intended to be a bug report with proposed patch. I am posting to
this mailing list as described in NOTE in "Bug Reporting in R".

Function `stats:::regularize.values()` is meant to preprocess `x` and `y`
arguments to have "proper" values for later use during interpolation. If
input is already "proper", I would expect it to reuse the same objects
without creating new ones. However, this isn't the case and is the source
of unneccessary extra memory usage in `approx()` and others.

The root cause of this seems to be a forceful reordering in lines 37-39 of
'approx.R' file. If reordering is done only if `x` is unsorted then no
copies are created. Also this doesn't seem like breaking any existing code.

There is a patch attached.

Reproducable code:
x <- seq(1, 100, 1)
y <- seq(1, 100, 1)

reg_xy <- stats:::regularize.values(x, y, mean)

# Regularized versions of `x` and `y` are identical to input but are stored
at
# different places
identical(x, reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["x"]]))
#> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["y"]]))
#> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Differs from original only by using `if (is.unsorted(x))`
new_regularize.values <- function (x, y, ties) {
  x <- xy.coords(x, y, setLab = FALSE)
  y <- x$y
  x <- x$x
  if (any(na <- is.na(x) | is.na(y))) {
    ok <- !na
    x <- x[ok]
    y <- y[ok]
  }
  nx <- length(x)
  if (!identical(ties, "ordered")) {
    if (is.unsorted(x)) {
      o <- order(x)
      x <- x[o]
      y <- y[o]
    }
    if (length(ux <- unique(x)) < nx) {
      if (missing(ties))
        warning("collapsing to unique 'x' values")
      y <- as.vector(tapply(y, match(x, x), ties))
      x <- ux
      stopifnot(length(y) == length(x))
    }
  }
  list(x = x, y = y)
}

new_reg_xy <- new_regularize.values(x, y, mean)

# Output is still identical to input and also references to the same objects
identical(x, new_reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["x"]]))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, new_reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["y"]]))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Current R version
R.version
#>                _
#> platform       x86_64-pc-linux-gnu
#> arch           x86_64
#> os             linux-gnu
#> system         x86_64, linux-gnu
#> status
#> major          3
#> minor          5.2
#> year           2018
#> month          12
#> day            20
#> svn rev        75870
#> language       R
#> version.string R version 3.5.2 (2018-12-20)
#> nickname       Eggshell Igloo

-- 
Best regards,
Evgeni Chasnovski

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 522 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20190104/87d5382c/attachment.bin>


More information about the R-devel mailing list