[R] A More efficient method?

Wed Jul 4 18:49:53 CEST 2007

This was in error since s3 was not set.  The as.numeric in the calculation
of s3 can be omitted if its ok to have an integer rather than numeric result
and in that case its still faster yet.

> set.seed(1)
> C <- sample(c("a", "b"), 1000000, replace = TRUE)
> system.time({
+ s0 <- vector(length = length(C))
+ for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
+ s0
+ })
   user  system elapsed
  21.32    0.02   26.10
> system.time(s1 <- ifelse(C == "a", 1, -1))
   user  system elapsed
   2.37    0.26    2.64
> system.time(s2 <- 2 * (C == "a") - 1)
   user  system elapsed
   0.32    0.02    0.35
> system.time({tmp <- C == "a"; s3 <- as.numeric(tmp - !tmp)})
   user  system elapsed
   0.28    0.02    0.31
> identical(s0, s1)
[1] TRUE
> identical(s0, s2)
[1] TRUE
> identical(s0, s3)
[1] TRUE
>

On 7/4/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> In thinking about this a bit more I have found a slightly faster one still.
> See s3.  Also I have added s0, the original solution, to the timings.
>
> > set.seed(1)
> > C <- sample(c("a", "b"), 1000000, replace = TRUE)
> > system.time({
> + s0 <- vector(length = length(C))
> + for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
> + s0
> + })
>   user  system elapsed
>  21.75    0.02   25.99
> > system.time(s1 <- ifelse(C == "a", 1, -1))
>   user  system elapsed
>   2.32    0.17    2.54
> > system.time(s2 <- 2 * (C == "a") - 1)
>   user  system elapsed
>   0.29    0.02    0.32
> > system.time({tmp <- C == "a"; tmp - !tmp})
>   user  system elapsed
>   0.21    0.00    0.21
> > identical(s0, s1)
> [1] TRUE
> > identical(s0, s2)
> [1] TRUE
> > identical(s0, s3)
> [1] TRUE
>
> On 7/4/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Here are two ways.  The second way is more than 10x faster.
> >
> > > set.seed(1)
> > > C <- sample(c("a", "b"), 100000, replace = TRUE)
> > > system.time(s1 <- ifelse(C == "a", 1, -1))
> >   user  system elapsed
> >   0.37    0.01    0.38
> > > system.time(s2 <- 2 * (C == "a") - 1)
> >   user  system elapsed
> >   0.02    0.00    0.02
> > > identical(s1, s2)
> > [1] TRUE
> >
> > On 7/4/07, Keith Alan Chamberlain <Keith.Chamberlain at colorado.edu> wrote:
> > > Dear Rhelpers,
> > >
> > > Is there a faster way than below to set a vector based on values from
> > > another vector? I'd like to call a pre-existing function for this, but one
> > > which can also handle an arbitrarily large number of categories. Any ideas?
> > >
> > > Cat=c('a','a','a','b','b','b','a','a','b')      # Categorical variable
> > > C1=vector(length=length(Cat))   # New vector for numeric values
> > >
> > > # Cycle through each column and set C1 to corresponding value of Cat.
> > > for(i in 1:length(C1)){
> > >        if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
> > > }
> > >
> > > C1
> > > [1] -1 -1 -1  1  1  1 -1 -1  1
> > > Cat
> > > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"
> > >
> > > Sincerely,
> > > KeithC.
> > > Psych Undergrad, CU Boulder (US)
> > > RE McNair Scholar
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>