[R] Pointer to covariates?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Feb 22 12:14:29 CET 2002


On Fri, 22 Feb 2002, [iso-8859-1] Göran Broström wrote:

> On Thu, 21 Feb 2002, Anne York wrote:
>
> > Here is another idea, but the overhead might be just as great.
> >
> > dat_data.frame(y=1:3,x1=c(1,0,1),x2=c(0,1,0))
> > dat.unique_unique(paste(as.character(dat$x1),as.character(dat$x2)))
> > dat.keys_match(paste(as.character(dat$x1),as.character(dat$x2)),dat.unique)
>
> This is very good! I made this function of it:
>
> cro.ay.orig <- function(dat){
>   covar <- unique(dat[, -1])
>   dat.keys <-
>     match(paste(dat$x1, dat$x2, sep = ""),
>           paste(covar$x1, covar$x2, sep = ""))
>
>   return(y = dat[, 1],
>          covar = covar,
>          keys = dat.keys)
> }
>
> and this is fast; with 'dat' containing 100000 observations, I get:
>
> > unix.time(sor.ay.orig <- cro.ay.orig(dat[1:100000, c(1, 2, 5)))
>
> [1] 1.00 0.02 1.08 0.00 0.00
>
> However, this function needs to be generalized, so I wrote:
>
> cro.ay <- function(dat, response = 1){
>   covar <- unique(dat[, -response, drop = FALSE])
>   dat.keys <-
>     match(apply(dat[, -response, drop = FALSE], 1, paste, collapse = ""),
>           apply(covar, 1, paste, collapse = ""))
>   return(y = dat[, response],
>          covar = covar,
>          keys = dat.keys)
> }
>
> but this was much slower (but acceptable) on the same data:
>
> [1] 11.63  0.32 12.34  0.00  0.00
>
> It is apparently the pasting row by row of the data frame,
>
>  apply(covar, 1, paste, collapse = "")
>
> that takes the time. Is there a better way of doing this?

Very probably. Note that the original did not paste row-by-row.  You could
use do.call.  Here's an untested variant

match(do.call("paste", c(dat[, -response, drop = FALSE], sep="\001")),
      do.call("paste", c(covar,  sep="\001")))

Note also that I used a different separator ("\r" is also possible), as
that is much more likely to make a unique string.  See
duplicated.data.frame for the use of this.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list