[R] Two Problems while trying to aggregate a dataframe

Gabor Grothendieck ggrothendieck at gmail.com
Sat Mar 24 19:17:05 CET 2007


Try this:

aggregate(atest[3:4], atest[1:2], sum)

Use a data base and SQL is you don't otherwise have enough
computer resources.


On 3/24/07, Delcour Libertus <delcour.libertus at gmail.com> wrote:
> Hello!
>
> Given is an Excel-Sheet with actually 11,000 rows and 9 columns. I want
> to work with the data in R. The contents are similar to my following
> example.
>
> I have a list with ID-number, personal name and two kinds of
> loan-values. I want to aggregate the list, that for each person only one
> row remains and where the loan-values are added.
>
> First I tried some commands with tapply but had no success at all. Then
> I found in this mailing list a hint for aggregate (though I did not
> understand most of that mail).
>
> So I made some efforts with aggregate() and it seems to lead the right way:
>
> [code]
> > atest <- read.csv2 ("aggregatetest.csv")
> > str(atest)
> `data.frame':   10 obs. of  4 variables:
>  $ PrsNr  : int  1 2 2 3 4 5 6 6 6 7
>  $ Namen  : Factor w/ 7 levels "Holla","Mabba",..: 1 2 2 4 5 6 7 7 7 3
>  $ Betrag1: num  1.99 2.34 5.23 4.23 2.23 2.77 3.83 2.76 6.32 2.88
>  $ Betrag2: num  3.44 5.32 5.21 9.12 7.32 8.32 6.99 4.45 5.34 3.81
> > atest
>   PrsNr Namen Betrag1 Betrag2
> 1      1 Holla    1.99    3.44
> 2      2 Mabba    2.34    5.32
> 3      2 Mabba    5.23    5.21
> 4      3  Pisa    4.23    9.12
> 5      4 Pulla    2.23    7.32
> 6      5  Raba    2.77    8.32
> 7      6  Saba    3.83    6.99
> 8      6  Saba    2.76    4.45
> 9      6  Saba    6.32    5.34
> 10     7 Mulla    2.88    3.81
> > aggregate(list(Betrag1=atest$Betrag1),  by=list(PsrNr=atest$PrsNr,
> Namen=atest$Namen),  sum)
>  PsrNr Namen Betrag1
> 1     1 Holla    1.99
> 2     2 Mabba    7.57
> 3     7 Mulla    2.88
> 4     3  Pisa    4.23
> 5     4 Pulla    2.23
> 6     5  Raba    2.77
> 7     6  Saba   12.91
> [/code]
>
> The result is nearly that I want.
>
> First problem:
>
> How do I get all columnss in my result. "Betrag2" is missing.
>
> Second problem:
>
> If I use the aggregate-command on the real data then it is for me
> impossible to use more than on by-grouping variable (my example above
> has two). Impossible because 1 GB RAM and 1.5 GB SWAP are not enough to
> process my command. My computer (Ubuntu Linux, Gmome) freezes. So I
> doubt wether I use the appropriate method to follow my target.
>
> Which ist the best way to aggregate dataframes as I want? Are there any
> better functions/commands or do I have to learn programming for this?
>
> Greetings
>
> Delcour
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list