[R] Sum Question

Marc Schwartz marc_schwartz at me.com
Thu Jun 30 19:35:16 CEST 2011


On Jun 30, 2011, at 12:30 PM, Marc Schwartz wrote:

> On Jun 30, 2011, at 11:20 AM, Edgar Alminar wrote:
> 
>>>> I did this:
>>>> 
>>>> library(data.table)
>>>> 
>>>> dd <- data.table(bl)
>>>> dd[,sum(as.integer(CONTTIME)), by = SCRNO]
>>>> 
>>>> (I used as.integer because I got an error message: sum not meaningful for factors)
>>>> 
>>>> And got this:
>>>> 
>>>>          SCRNO  V1
>>>> [1,] HBA0020036 111
>>>> [2,] HBA0020087  71
>>>> [3,] HBA0020209 140
>>>> [4,] HBA0020213 189
>>>> [5,] HBA0020222 174
>>>> [6,] HBA0020292 747
>>>> [7,] HBA0020310  57
>>>> [8,] HBA0020317 291
>>>> [9,] HBA0020365 417
>>>> [10,] HBA0020366 124
>>>> 
>>>> All the sums are way too big. Is there something making it not add up correctly?
>>>> 
>>>> Original dataset:
>>>> 
>>    RID      SCRNO VISCODE RECNO CONTTIME
>> 338   43 HBA0020036      bl     1        9
>> 1187  95 HBA0020087      bl     1        3
>> 3251 230 HBA0020209      bl     2        3
>> 3258 230 HBA0020209      bl     1       28
>> 3321 235 HBA0020213      bl     2        5
>> 3351 235 HBA0020213      bl     1        6
>> 3436 247 HBA0020222      bl     1        5
>> 3456 247 HBA0020222      bl     2        4
>> 4569 321 HBA0020292      bl    13        2
>> 4572 321 HBA0020292      bl     5       13
>> 4573 321 HBA0020292      bl     1       25
>> 4576 321 HBA0020292      bl     7        5
>> 4578 321 HBA0020292      bl     8        2
>> 4581 321 HBA0020292      bl     4        4
>> 4582 321 HBA0020292      bl     9        5
>> 4586 321 HBA0020292      bl    12        2
>> 4587 321 HBA0020292      bl     6        2
>> 4590 321 HBA0020292      bl    10        3
>> 4591 321 HBA0020292      bl    11        7
> 
> 
> That is not the entire dataset....HBA0020366 is missing, as an example.
> 
> I don't use the data.table package, but if you are getting an error indicating that CONTTIME is a factor, then something is wrong with either the data itself (there are non-numeric entries) or the way in which it was entered/imported into R.
> 
> Thus, I would first check your data for errors. Use str(YourDataSet) to review its structure and if CONTTIME is a factor, check into the data to see why.
> 
> Lastly, review this R FAQ:
> 
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
> 
> Just as an alternative, with your data in 'DF':
> 
>> DF
>     RID      SCRNO VISCODE RECNO CONTTIME
> 338   43 HBA0020036      bl     1        9
> 1187  95 HBA0020087      bl     1        3
> 3251 230 HBA0020209      bl     2        3
> 3258 230 HBA0020209      bl     1       28
> 3321 235 HBA0020213      bl     2        5
> 3351 235 HBA0020213      bl     1        6
> 3436 247 HBA0020222      bl     1        5
> 3456 247 HBA0020222      bl     2        4
> 4569 321 HBA0020292      bl    13        2
> 4572 321 HBA0020292      bl     5       13
> 4573 321 HBA0020292      bl     1       25
> 4576 321 HBA0020292      bl     7        5
> 4578 321 HBA0020292      bl     8        2
> 4581 321 HBA0020292      bl     4        4
> 4582 321 HBA0020292      bl     9        5
> 4586 321 HBA0020292      bl    12        2
> 4587 321 HBA0020292      bl     6        2
> 4590 321 HBA0020292      bl    10        3
> 4591 321 HBA0020292      bl    11        7
> 
> 
>> aggregate(CONTTIME ~ DF$SCRNO, data = DF, sum)
>    DF$SCRNO CONTTIME
> 1 HBA0020036        9
> 2 HBA0020087        3
> 3 HBA0020209       31
> 4 HBA0020213       11
> 5 HBA0020222        9
> 6 HBA0020292       70


Quick typo correction here. the 'DF$' in DF$SCRNO is superfluous. I did not clean that up before copying and pasting.

> aggregate(CONTTIME ~ SCRNO, data = DF, sum)
       SCRNO CONTTIME
1 HBA0020036        9
2 HBA0020087        3
3 HBA0020209       31
4 HBA0020213       11
5 HBA0020222        9
6 HBA0020292       70


Marc



More information about the R-help mailing list