[R] help with aggregate()

Jan van der Laan rhelp at eoos.dds.nl
Tue Feb 15 10:09:44 CET 2011


The fact that your column names from your aggregate result contain multiple numbers, suggests that something has gone wrong with reading your data in from file. Have you had a look at your data.frame 'all'? Are BAR and X etc. numeric? Judging from the 'c. etc' they aren't.


>  So, how do I aggregate the data frame?

Aggregate either accepts a data.frame or a vector as first argument (actually anything that can be coerced into a data.frame). In case of a data.frame is applies the aggregation function to each column. So, your first aggregate call should be ok (except that you input might be wrong (see above)). However, you didn't use names arguments in you list() so R will generate names for you. Hence, the strange names.

aggregate returns a data.frame. So if you want to do combine more than one aggregate call, you can use merge to merge the results:

Count<- aggregate(all$FOO, by = list(FOO=all$FOO), FUN = length);
byFOO<- merge(byFOO, by="FOO")

If you want to have a vector you could use tapply.

>  How do I rename a column?

?names

e.g.
names(all)<- c("column1" , "column2", ...)

>  How do I check that two vectors are the same?

?all

all(vector1 == vector2)

but first have a look at:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


HTH,
Jan







On 02/15/2011 12:42 AM, Sam Steingold wrote:
> Hi,
>
> I am trying to aggregate some data and I am confused by the results.
> I load a data frame "all" from a csv file, and then I do:
> (FOO,BAR,X,Y come from the header line in the csv file,
> BTW, how do I rename a column?)
>
> byFOO<- aggregate(list(all$BAR,all$QUUX,all$X/all$Y),
>                       by = list(FOO=all$FOO),
>                       FUN = mean);
>
> I expect a data frame with 4 columns: FOO,BAR,QUUX and X/Y with all FOO
> being different (they are character strings, do I need a special
> incantation to turn them into factors?)
> what I get is indeed a data frame but with names
>
> [1] "FOO"
> [2] "c.1.78e.11..4.38e.09..1.461e.11..4.3186e.10..1.1181e.10..5.5389e.10.."
> [3] "c.33879300..3713870..190963000..7042170..4590010..91569200..12108200.."
> [4] "c.1.37087599544937..1.72690992018244..1.82034830430797..1.70338983050847.."
>
> why? how do I fix the column names?
>
> then I am trying to add to that same frame byFOO some other columns:
>
> byFOO$Count<- aggregate(all$FOO, by = list(all$FOO), FUN = length);
> byFOO$Mean<- aggregate(all$Value, by = list(all$FOO), FUN = mean);
> byFOO$Total<- aggregate(all$Value, by = list(all$FOO), FUN = sum);
>
> however, byFOO$Count et al are not columns in byFOO with the appropriate
> names ("Count"&c) but data frames with columns "Group.1" and "x".
> Luckily, at least it appears that byFOO$Count$Group.1 is the same as
> byFOO$FOO, as they should be, although I don't see any function which
> would check that two vectors are the same ("==" returns a vector which I
> have to manually inspect for presence of "FALSE").
>
> So, how do I aggregate the data frame?
> How do I rename a column?
> How do I check that two vectors are the same?
>
> thanks a lot!
>
> PS. I have not used R for a few years, so please be gentle...
> PPS. Please do not tell me to RTFM - I did. At least tell me what to
> search for.
>



More information about the R-help mailing list