[R] aggregate function oddity

Mihalicza Péter mihalicza.peter at eski.hu
Mon Sep 17 14:29:17 CEST 2007


Dear All,

I tried to aggregate the rows according to some factors in a data frame. 
I got the
"Error in Summary.factor(..., na.rm = na.rm) :
        sum not meaningful for factors"
message. This problem was once already discussed in 2003 on this list, 
where the following solution was given: include only those columns -when 
giving it to aggregate() -  that are not factors.

It also worked for me, but this solution is a bit odd, since there is no 
need to sum the factors given as grouping variables. Of course I may do 
something completely wrong.
help(aggregate) says:
## S3 method for class 'data.frame': aggregate(x, by, FUN, ...)
|x| 	an R object.
|by| 	a list of grouping elements, each as long as the variables in |x|. 
Names for the grouping variables are provided if they are not given. The 
elements of the list will be coerced to factors (if they are not already 
factors).

In my interpretation this means that the factor variables and the 
numeric variables are in the same data frame, namely x.

The data frame looks like this (its mortality from cerebrovascular 
diseases):
 > str(agyer)
'data.frame':   102 obs. of  65 variables:
 $ Country            : int  4055 4055 4055 4055 4055 4055 4055 4055 
4055 4055 ...
 $ Name               : Factor w/ 5 levels "Estonia","Latvia",..: 1 1 1 
1 1 1 1 1 1 1 ...
 $ Year               : int  1997 1997 1998 1999 1999 1999 2000 2000 
2000 2001 ...
 $ List               : int  103 103 103 103 103 103 103 103 103 103 ...
 $ Sex                : int  2 1 2 2 1 2 2 1 1 2 ...
 $ Morticd10_103_Frmat: int  1 1 1 1 1 1 1 1 1 1 ...
 $ IM_Frmat           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Deaths1            : int  33 179 143 1428 83 61 3 759 29 4 ...
and a bunch of other int variables.

After omitting agyer$Name, I do
 > agyerpr=aggregate(agyer, by=list(agyer$Country, agyer$Year, 
agyer$List, agyer$Sex, agyer$Morticd10_103_Frmat, agyer$IM_Frmat), sum)

The sum is done on -the already omitted - factor of "Cause".

I do not understand why it tries to sum a factor that is included in the 
"by" list, since the concept is not to sum for those included, but use 
them for grouping. I am lucky with this database because all the factors 
can be interpreted as integers and I do not have to onit them one by 
one, but what if not?

Am I missing something with aggregate or classes?

Thanks for your help!

Sincerely,
Peter Mihalicza



-- 
This message has been scanned for viruses and\ dangerous con...{{dropped}}



More information about the R-help mailing list