[Rd] Suggestions for 'diff.default'

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Mon Feb 4 06:28:44 CET 2013


Inspired by discussion in "Need very fast application of 'diff' - ideas?" (around https://stat.ethz.ch/pipermail/r-help/2012-January/301873.html), I have another suggestion.

Suggestion 3: Make 'diff.default' run faster.

For vector case (if suggestion 2 is not applied or if unclassed input is treated specially), without resorting to C, I found that a speedup may be gained by changing
r[-length(r):-(length(r)-lag+1L)]
with
`length<-`(r, length(r)-lag)

Another way, with similar idea, that triggers warning, is doing as follows.

    {
        for (i in seq_len(differences)) r <- r[i1] - r
        length(r) <- xlen - lag * differences
    }

Variables 'i1' and 'xlen' are as defined in function 'diff.default' in R.


--- On Tue, 29/1/13, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> wrote:

> From: Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> Subject: Re: Suggestions for 'diff.default'
> To: R-devel at lists.R-project.org
> Date: Tuesday, 29 January, 2013, 10:32 AM
> 
> 
> --- On Mon, 28/1/13, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> wrote:
> 
> > From: Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> > Subject: Suggestions for 'diff.default'
> > To: R-devel at lists.R-project.org
> > Date: Monday, 28 January, 2013, 5:31 PM
> > I have suggestions for function
> > 'diff.default' in R.
> > 
> > 
> > Suggestion 1: If the input is matrix, always return
> matrix,
> > even if empty.
> > 
> > What happens in R 2.15.2:
> > 
> > > rbind(1:2)    # matrix
> >      [,1] [,2]
> > [1,]    1    2
> > > diff(rbind(1:2))   # not matrix
> > integer(0)
> > > sessionInfo()
> > R version 2.15.2 (2012-10-26)
> > Platform: i386-w64-mingw32/i386 (32-bit)
> > 
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices
> > utils     datasets 
> > methods   base
> > 
> > 
> > The documentation for 'diff' says, "If 'x' is a matrix
> then
> > the difference operations are carried out on each
> column
> > separately."
> > If the result is empty, I expect that the result still
> has
> > as many columns as the input.
> > 
> > 
> > Suggestion 2: Make 'diff.default' applicable more
> generally
> > by
> > (a) not performing 'unclass';
> > (b) generalizing (changing)
> > ismat <- is.matrix(x)
> > to become
> > ismat <- length(dim(x)) == 2L
> > 
> > 
> > If suggestion 1 is to be applied, if 'unclass' is not
> wanted
> > (point (a) in suggestion 2 is also to be applied),
> > 
> >     if (lag * differences >= xlen)
> >     return(x[0L])
> > 
> > can be changed to
> > 
> >     if (lag * differences >= xlen)
> >     return(
> >             if (ismat) x[0L, ,
> > drop = FALSE] - x[0L, , drop = FALSE] else
> >             x[0L] - x[0L])
> > 
> > It will handle class where subtraction (minus)
> operation
> > change class.
> Sorry, I wasn't careful enough. To obtain the correct class
> for the result, differencing should be done as many times as
> specified by argument 'differences'.
> 
> I consider the case of
> diff(as.POSIXct(c("2012-01-01", "2012-02-01"), tz="UTC"),
> d=2)
> versus
> diff(diff(as.POSIXct(c("2012-01-01", "2012-02-01"),
> tz="UTC")))
> To be safe, maybe just compute as usual, even when it is
> known that the end result will be empty. It can be done like
> this.
> 
>     empty <- integer()
>     if (ismat)
>     for (i in seq_len(differences))
>         r <- if (lag >=
> nrow(r))
>                
> r[empty, , drop = FALSE] - r[empty, , drop = FALSE] else
>                 ...
>     else
>         for (i in seq_len(differences))
>             r <- if (lag
> >= length(r))
>                
> r[empty] - r[empty] else
>                 ...
> 
> If that way is used, 'xlen' is no longer needed.
> > 
> > Otherwise, if 'unclass' is wanted, maybe the handling
> of
> > empty result can be moved to be after 'unclass', to be
> > consistent with non-empty result.
> > 
> > 
> > If point (a) in suggestion 2 is applied, 'diff.default'
> can
> > handle input of class "Date" and "POSIXt". If, in
> addition,
> > point (b) in suggestion 2 is also applied,
> 'diff.default'
> > can handle data frame as input.
> >
>



More information about the R-devel mailing list