[R] Working with data-frame

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon Nov 10 01:15:37 CET 2014


... so...

#1 ... flexible syntax for split-apply-combine, not very efficient for large data
library(plyr)
ddply(Dat,c("A1", "A2"), function(DF){data.frame(C1=sum(DF$C1))})

#2 ... compatible with large data on disk
library(sqldf)
sqldf("select A1,A2,sum(C1) as C1 from Dat group by A1, A2")

#3 ... better for large data in memory
library(data.table)
dtt <- data.table(Dat)
#speed for large data
setkeyv(dtt,c("A1", "A2"))
dtt[,list(C1=sum(C1)),by=list(A1,A2)]

#4 ... package still under development, but potentially can support operations on data stored in memory or relational databases
library(dplyr)
Dat %>% group_by(A1,A2) %>% summarise( C1=sum( C1 ) )

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On November 9, 2014 1:39:45 PM PST, William Dunlap <wdunlap at tibco.com> wrote:
>> I tried with spilt() function. However it looks to me that, it can
>> split a data-frame w.r.t. only one column.
>
>(I assume you you meant 'split', not 'spilt'.)
>
>You did not show what you tried, but the following splits Dat by its
>"A1"
>and "A2" columns (creating a list of data.frames):
>   split(Dat, f=Dat[,c("A1","A2")])
>
>aggregate(), in core R, combine the split and the lapply needed to
>calculate groupwise sums.  E.g.,
>  aggregate(Dat$C1, by=Dat[,c("A1","A2")], FUN=sum)
>  aggregate(C1 ~ A1 + A2, data=Dat, FUN=sum)
>
>The plyr and dplyr packages have other ways to do this sort of thing.
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso <
>bogaso.christofer at gmail.com> wrote:
>
>> Hi again,
>>
>> Let say, I have following data frame:
>>
>>
>> Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L,
>2L,
>> 3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
>> 3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class =
>"factor"),
>>     A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
>>     2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names =
>c("A1",
>> "A2", "C1"), row.names = c(NA, -30L), class = "data.frame")
>>
>>
>> Now my goal is :
>> 1: Find all possible unique combinations of column 'A1' & column
>'A2'.
>> For example A1 = c, A2 = 2 is 1 unique combination.
>>
>> 2. For each such unique combination, calculate sum for 'A3'.
>>
>> Is there any direct R function to achieve this faster way? I have
>very
>> large data-frame to handle with such calculation.
>>
>> I tried with spilt() function. However it looks to me that, it can
>> split a data-frame w.r.t. only one column.
>>
>> Thanks for your suggestion
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list