[R] vectorization condition counting

arun smartpink111 at yahoo.com
Sat Aug 11 03:44:30 CEST 2012



HI,

This may also help:
someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J")))
f2<-function(x){
  needsChanging<-with(someTags,is.na(match(tag_id,tag_id[duplicated(tag_id)]))&lgth<300)
 x$stage[needsChanging]<-"J"
 x
 }
 f2(someTags)
#  tag_id lgth stage
#1      1   50     J
#2      2  100     .
#3      2  150     .
#4      3  200     J
#5      4  250     J
#6      5  300     .
#7      6  350     .
#8      6  400     .
A.K.


----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: Guillaume2883 <guillaume.bal.pro at gmail.com>; "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Friday, August 10, 2012 8:02 PM
Subject: Re: [R] vectorization condition counting

Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its
value, may be vectorized by the sneaky idiom
   !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE)

Hence f0() (with your code in a loop) and f1() are equivalent:
f0 <- function (tags) {
    for (i in seq_len(nrow(tags))) {
        if (sum(tags$tag_id == tags$tag_id[i]) == 1 & tags$lgth[i] < 300) {
            tags$stage[i] <- "J"
        }
    }
    tags
}
f1 <-function (tags) {
    needsChanging <- with(tags, !(duplicated(tag_id, fromLast = FALSE) |
        duplicated(tag_id, fromLast = TRUE)) & lgth < 300)
    tags$stage[needsChanging] <- "J"
    tags
}

E.g.,
> someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J")))
> all.equal(f0(someTags), f1(someTags))
[1] TRUE
> f1(someTags)
  tag_id lgth stage
1      1   50     J
2      2  100     .
3      2  150     .
4      3  200     J
5      4  250     J
6      5  300     .
7      6  350     .
8      6  400     .

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Guillaume2883
> Sent: Friday, August 10, 2012 3:47 PM
> To: r-help at r-project.org
> Subject: [R] vectorization condition counting
> 
> Hi all,
> 
> I am working on a really big dataset and I would like to vectorize a
> condition in a if loop to improve speed.
> 
> the original loop with the condition is currently writen as follow:
> 
> if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1&tags$lgth[i]<300){
> 
>      tags$stage[i]<-"J"
> 
>    }
> 
> Do you have some ideas ? I was unable to do it correctly
> Thanking you in advance for your help
> 
> Guillaume
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/vectorization-condition-
> counting-tp4639992.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list