[R] Surprising Behavior of 'tapply'

Rau, Roland Rau at demogr.mpg.de
Fri Feb 4 10:58:43 CET 2005


Dear helpers,

thank you very much for your advice.

After starting a new R-session this morning, I was also unable to replicate the problem, although the old session showed still the same problem.
One suggestion was that I maybe redefined some functions, but this was not the case. I only loaded one additional package (Hmisc) but I did this now as well and it did not cause any problems.
Another suggestion (alternative) was to use 'xtabs'. This works also nicely, but I made some timings with my dataset (moderate size of 6MB) and I assume that for really large datasets 'tapply' is probably faster than 'xtabs':

> system.time(tapply(austria$COUNT, list(austria$sescat, austria$STATUS, austria$SEX), sum))
[1] 0.05 0.00 0.04   NA   NA
> system.time(xtabs(austria$COUNT ~., data.frame(ses = austria$sescat, status =austria$STATUS, sex=austria$SEX)))
[1] 0.86 0.00 0.86   NA   NA
> 

(I did the timings several times and was also using gc() ).

Thanks again (in chronological order) to Bert Gunter, Carlos Ortega, James Holtman, and Gabor Grothendieck.

Best,
Roland



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor Grothendieck
Sent: Thursday, February 03, 2005 9:08 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] Surprising Behavior of 'tapply'


I tried it on Windows XP with R 2.1.0 and could not replicate it either.
Suggest you start up a fresh session and try it again.

By the way, you could consider this:

xtabs(count ~., data.frame(sex = sex, income = income))



Carlos Ortega <carlos_ortegafernandez <at> yahoo.es> writes:

: 
: Hi,
: 
: That is something strange, I could not replicate it...
: 
: Regards,
: Carlos.
: 
: +++++++++++++++++++++++++++++++++++
: > version
:          _              
: platform i386-pc-mingw32
: arch     i386           
: os       mingw32        
: system   i386, mingw32  
: status                  
: major    2              
: minor    0.1            
: year     2004           
: month    11             
: day      15             
: language R              
: > sex <- rep(c("F", "M"), 5)
: > income <-  c(rep("low", 5), rep("high", 5))
: > count <- 1:10
: > mydf <- as.data.frame(cbind(sex, income, count))
: > mydf$count = as.numeric(as.character(mydf$count))
: > tapply(mydf$count, list(mydf$sex, mydf$income),
: FUN=sum)
:   high low
: F   16   9
: M   24   6
: ++++++++++++++++++++++++++++++++++++++++++++
: 
:  --- "Rau, Roland" <Rau <at> demogr.mpg.de> escribió: 
: > Dear all,
: > 
: > I wanted to make a two-way-table of two variables
: > with a counting
: > variable stored in another column of a dataframe. In
: > version 1.9.1, the
: > behavior is as expected as shown in the simplified
: > example code.
: > 
: > > sex <- rep(c("F", "M"), 5)
: > > income <-  c(rep("low", 5), rep("high", 5))
: > > count <- 1:10
: > > mydf <- as.data.frame(cbind(sex, income, count))
: > > mydf$count = as.numeric(as.character(mydf$count))
: > > tapply(mydf$count, list(mydf$sex, mydf$income),
: > FUN=sum)
: >   high low
: > F   16   9
: > M   24   6
: > > version
: >          _              
: > platform i386-pc-mingw32
: > arch     i386           
: > os       mingw32        
: > system   i386, mingw32  
: > status                  
: > major    1              
: > minor    9.1            
: > year     2004           
: > month    06             
: > day      21             
: > language R              
: > > 
: > 
: > In version 2.0.1, however, I get the following
: > output:
: > 
: > > sex <- rep(c("F", "M"), 5)
: > > income <-  c(rep("low", 5), rep("high", 5))
: > > count <- 1:10
: > > mydf <- as.data.frame(cbind(sex, income, count))
: > > mydf$count = as.numeric(as.character(mydf$count))
: > > tapply(mydf$count, list(mydf$sex, mydf$income),
: > FUN=sum)
: > Error in get(x, envir, mode, inherits) : variable
: > "FUN" was not found
: > > version
: >          _              
: > platform i386-pc-mingw32
: > arch     i386           
: > os       mingw32        
: > system   i386, mingw32  
: > status                  
: > major    2              
: > minor    0.1            
: > year     2004           
: > month    11             
: > day      15             
: > language R              
: > > 
: > 
: > Was this change in behavior intended with the
: > changes in tapply from
: > R1.9.1 to R2.0.1?
: > Is the R-help-list appropriate or rather R-Devel?
: > 
: > Thanks,
: > Roland
: > 
: > 
: > 
: > +++++
: > This mail has been sent through the MPI for
: > Demographic Rese...{{dropped}}
: > 
: > ______________________________________________
: > R-help <at> stat.math.ethz.ch mailing list
: > https://stat.ethz.ch/mailman/listinfo/r-help
: > PLEASE do read the posting guide!
: > http://www.R-project.org/posting-guide.html
: >
: 
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}




More information about the R-help mailing list