[R] by function with sum does not give what is expected from by function with print

Rasmus Liland jr@| @end|ng |rom po@teo@no
Fri Jul 24 01:48:30 CEST 2020


On 2020-07-23 18:54 -0400, Duncan Murdoch wrote:
> On 23/07/2020 6:15 p.m., Sorkin, John wrote:
> > Colleagues,
> > The by function in the R program below is not giving me the sums
> > I expect to see, viz.,
> > 382+170=552
> > 4730+170=4900
> > 5+6=11
> > 199+25=224
> > ###################################################
> > #full R program:
> > mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> > sex=(rep(c(1,1,0,0),2)),
> > status=rep(c(1,0),2),
> > values=c(382,4730,5,199,170,497,6,25))
> > mydata
> > by(mydata,list(mydata$sex,mydata$status),sum)
> > by(mydata,list(mydata$sex,mydata$status),print)
> > ###################################################
> 
> The problem is that you are summing the mydata values, not the mydata$values
> values.  That will include covid, sex and status in the sums.  I think
> you'll get what you should (though it doesn't match what you say you
> expected, which looks wrong to me) with this code:
> 
> by(mydata$values,list(mydata$sex,mydata$status),sum)
> 
> for 0,0, the sum is 224 = 199+25
> for 0,1, the sum is  11 = 5+6
> for 1,0, the sum is 5227 = 4730 + 497 (not 4730 + 170)
> for 1,1, the sum is 552 = 382 + 170

Dear John,

Aggregate also does this, but sex and 
status are columns in a data.frame and 
not attributes of the double.

	aggregate(x=list("values"=mydata$values),
	          by=list("sex"=mydata$sex,
	                  "status"=mydata$status),
	          FUN=sum)

yields

	  sex status values
	1   0      0    224
	2   1      0   5227
	3   0      1     11
	4   1      1    552

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200724/dd903768/attachment.sig>


More information about the R-help mailing list