[Rd] Any interest in "merge" and "by" implementations specifically for so

Wed Aug 2 03:09:32 CEST 2006

Hi,

My last word on this topic until I get a working external R package ...

The igroup code has now been validated both with and without NAs and  
with and without removing them.  Thanks to Bill, Tom, Thomas, and  
everyone for your helpful comments and hints.

The results for my validation run are here in case anyone is interested.
So my code now officially works.  If anyone wants patches against the  
latest development version of R to play around with (do your own  
timings, etc), please just let me know and I will send the patches  
privately.

I will start to work on an external package next week when I have  
more time.

Hope this helps,

Kevin

 > x <- rnorm(2e6)
 > i <- rep(1:1e6,2)
 > y <- runif(2e6)
 > is.na(x[y > 0.8]) <- TRUE
 >
 > suma = unlist(lapply(split(x,i),sum,na.rm=T))
 > names(suma) <- NULL
 > sumb = igroupSums(x,i,na.rm=T)
 > all.equal(suma,sumb)
[1] TRUE
 >
 >
 > suma = unlist(lapply(split(x,i),sum,na.rm=F))
 > names(suma) <- NULL
 > sumb = igroupSums(x,i,na.rm=F)
 > all.equal(suma,sumb)
[1] TRUE
 >
 >
 > maxa = unlist(lapply(split(x,i),max,na.rm=T))
There were 50 or more warnings (use warnings() to see the first 50)
 > names(maxa)<-NULL
 > maxb <- igroupMaxs(x,i,na.rm=T)
 > all.equal(maxa, maxb)
[1] TRUE
 >
 >
 > maxa = unlist(lapply(split(x,i),max,na.rm=F))
 > names(maxa)<-NULL
 > maxb <- igroupMaxs(x,i,na.rm=F)
 > all.equal(maxa, maxb)
[1] TRUE
 >
 >
 > mina = unlist(lapply(split(x,i),min,na.rm=T))
There were 50 or more warnings (use warnings() to see the first 50)
 > names(mina)<-NULL
 > minb <- igroupMins(x,i,na.rm=T)
 > all.equal(mina, minb)
[1] TRUE
 >
 >
 > mina = unlist(lapply(split(x,i),min,na.rm=F))
 > names(mina)<-NULL
 > minb <- igroupMins(x,i,na.rm=F)
 > all.equal(mina, minb)
[1] TRUE
 >
 >
 > meana = unlist(lapply(split(x,i),mean,na.rm=T))
 > names(meana)<-NULL
 > meanb <- igroupMeans(x,i,na.rm=T)
 > all.equal(meana, meanb)
[1] TRUE
 >
 > meana = unlist(lapply(split(x,i),mean,na.rm=F))
 > names(meana)<-NULL
 > meanb <- igroupMeans(x,i,na.rm=F)
 > all.equal(meana, meanb)
[1] TRUE
 >
 >
 > proda = unlist(lapply(split(x,i),prod,na.rm=T))
 > names(proda)<-NULL
 > prodb <- igroupProds(x,i,na.rm=T)
 > all.equal(proda, prodb)
[1] TRUE
 >
 > proda = unlist(lapply(split(x,i),prod,na.rm=F))
 > names(proda)<-NULL
 > prodb <- igroupProds(x,i,na.rm=F)
 > all.equal(proda, prodb)
[1] TRUE
 >
 >
 > cnta <- unlist(lapply(split(x,i),length))
 > names(cnta) <- NULL
 > cntb <- igroupCounts(x,i,na.rm=F)
 > all.equal(cnta,cntb)
[1] TRUE
 >
 >
 > anya <- unlist(lapply(split((x>1.0),i),any,na.rm=T))
 > names(anya)<-NULL
 > anyb <- igroupAnys((x>1.0),i,na.rm=T)
 > all.equal(anya,anyb)
[1] TRUE
 >
 >
 > anya <- unlist(lapply(split((x>1.0),i),any,na.rm=F))
 > names(anya)<-NULL
 > anyb <- igroupAnys((x>1.0),i,na.rm=F)
 > all.equal(anya,anyb)
[1] TRUE
 >
 >
 > alla <- unlist(lapply(split((x>1.0),i),all,na.rm=T))
 > names(alla)<-NULL
 > allb <- igroupAlls((x>1.0),i,na.rm=T)
 > all.equal(alla,allb)
[1] TRUE
 >
 >
 > alla <- unlist(lapply(split((x>1.0),i),all,na.rm=F))
 > names(alla)<-NULL
 > allb <- igroupAlls((x>1.0),i,na.rm=F)
 > all.equal(alla,allb)
[1] TRUE