[R] tapply() and using factor() on a factor

William Dunlap wdunlap at tibco.com
Fri Oct 16 05:59:05 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Alexander 
> Peterhansl
> Sent: Thursday, October 15, 2009 2:50 PM
> To: r-help at r-project.org
> Subject: [R] tapply() and using factor() on a factor
> 
> Dear List,
> 
>  
> 
> Shouldn't result1 and result2 be equal in the following case?
> 
>  
> 
> Note that log$RequestID is a factor.  That is, 
> is.factor(log$RequestID)
> yields TRUE.
> 
>  
> 
> result1 <- tapply(log$Flag,factor(log$RequestID),sum)
> 
> result2 <- tapply(log$Flag,log$RequestID,sum)

Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily.  Since you
didn't I'll guess what the dataset may contain.

If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level.  factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's.  E.g.,

> log<-data.frame(Flag=1:2, RequestID=factor(letters[1:2],
levels=letters[1:10]))
> tapply(log$Flag, log$RequestID, sum)
 a  b  c  d  e  f  g  h  i  j
 1  2 NA NA NA NA NA NA NA NA
> tapply(log$Flag, factor(log$RequestID), sum)
a b
1 2

I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
>  
> 
> Yet, when I summarize the output, I get the following:
> 
> summary(result1)
> 
>    Min.    1st Qu.  Median  Mean 3rd Qu.    Max. 
> 
>   11.00   11.00     11.00      26.06   11.00       101.00
> 
>  
> 
> summary(result2)
> 
>    Min. 1st Qu.  Median Mean 3rd Qu.    Max.    NA's 
> 
>   11.00   11.00   11.00        26.06   11.00  101.00   978.00
> 
>  
> 
> Why does result2 have 978 NA's?
> 
>  
> 
> Any help on this would be appreciated.
> 
>  
> 
> Alex
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list