[R] aggregate dataframe by multiple factors

Karim Mezhoud kmezhoud at gmail.com
Sun Nov 20 20:26:32 CET 2016


Many Thanks,
What is exactly the solution.
Best,
Karim

On Sun, Nov 20, 2016 at 8:17 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Nov 20, 2016, at 11:11 AM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> >
> >> On Nov 20, 2016, at 10:49 AM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
> >>
> >> Yes the results does not have a date but
> >> successive DCE.1, DCE2, DCE.3 indicates
> >> DCE.date1 , DCE.date2, DCE.date3
> >> I hope that the chronological order of date is conserved.
> >
> > There are, however, 4 dates. See this pair of results. You can probably
> do something if you ever figure out what it is that you precisely want. I
> think the date ordering is automatic here:
> >
> >> require(reshape2)
> > Loading required package: reshape2
> >
> >> dcast(dat, first.Name +  Name + Department ~ date, value.var='DCE')
> >  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30
> 2014-06-28
> > 1      Amish  Wives        TAS       0.59         NA       0.59
>  NA
> > 2    Ancient Nation        QLH       0.54       0.28       0.54
>  0.28
> > 3    Auction Videos        YME       0.57         NA       0.57
>  NA
> >> dcast(dat, first.Name +  Name + Department ~ date, value.var='DP')
> >  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30 2014-06-28
> > 1      Amish  Wives        TAS       0.56         NA       0.56
>  NA
> > 2    Ancient Nation        QLH       0.58       0.29       0.58
>  0.29
> > 3    Auction Videos        YME       0.56         NA       0.56
>  NA
>
> Yjos completes the process in the reshape2 world:
>
> > mdat <- melt(dat, measure.vars=c("DCE", 'DP') )
> > str(mdat)
> 'data.frame':   24 obs. of  6 variables:
>  $ first.Name: Factor w/ 48 levels "Amish","Ancient",..: 3 1 2 3 1 2 3 1 2
> 3 ...
>  $ Name      : Factor w/ 48 levels "Aliens","Behavior",..: 43 47 29 43 47
> 29 43 47 29 43 ...
>  $ Department: Factor w/ 8 levels "HXW","QLH","RAR",..: 8 6 2 8 6 2 8 6 2
> 8 ...
>  $ date      : Date, format: "2013-09-30" "2013-09-30" ...
>  $ variable  : Factor w/ 2 levels "DCE","DP": 1 1 1 1 1 1 1 1 1 1 ...
>  $ value     : num  0.57 0.59 0.54 NA NA 0.28 0.57 0.59 0.54 NA ...
>
> > dcast(mdat, first.Name +  Name + Department ~ date+variable )
>   first.Name   Name Department 2013-09-30_DCE 2013-09-30_DP 2013-12-28_DCE
> 1      Amish  Wives        TAS           0.59          0.56             NA
> 2    Ancient Nation        QLH           0.54          0.58           0.28
> 3    Auction Videos        YME           0.57          0.56             NA
>   2013-12-28_DP 2014-03-30_DCE 2014-03-30_DP 2014-06-28_DCE 2014-06-28_DP
> 1            NA           0.59          0.56             NA            NA
> 2          0.29           0.54          0.58           0.28          0.29
> 3            NA           0.57          0.56             NA            NA
>
> --
> David.
>
> >
> >
> >
> >> Thanks,
> >> Karim
> >>
> >>
> >> On Sun, Nov 20, 2016 at 7:44 PM, David Winsemius <
> dwinsemius at comcast.net> wrote:
> >>
> >>> On Nov 20, 2016, at 5:28 AM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
> >>>
> >>> Sorry for the delay,
> >>> Many Thanks for Mr. David and Mr. Petr
> >>> I thinked  to use "sort" function to arrange chronologically  value
> by  'date' (without 'date' is colnames) of each variables (DCE, DP).
> >>>
> >>>
> >>> The solution of David seems to be simple to understand with "unlist"
> function.
> >>> The solution of Petr seems to be fancy. I did not find document  about
> "I" argument for aggregate function.
> >>>
> >>> How can know which value for which date?
> >>
> >> You asked for a functional reshaping that did not have the date. Now
> you want the date? There was no date in the result you indicated was
> desired.   ....??????
> >>
> >> --
> >> David.
> >>
> >>>
> >>> I will save the reshaping/ordering dataframe  for later use.
> >>> Many Thanks,
> >>> Karim
> >>>
> >>>
> >>> On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pikal at precheza.cz>
> wrote:
> >>> Hi
> >>>
> >>> same result can be achieved by
> >>>
> >>> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
> >>>
> >>> Sorting according to the first row seems to be quite tricky. You could
> probably get closer by using some combination of split and order and
> arranging back chunks  of data
> >>>
> >>> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> >>> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T)), rbind))[ooo1,]
> >>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> >>> 2               0.28              NA                 NA
> >>> 4               0.28              NA                 NA
> >>> 1               0.54            0.59               0.57
> >>> 3               0.54            0.59               0.57
> >>>
> >>> however I wonder why the order according to the first row is necessary
> if all NAs are on correct positions?
> >>>
> >>> Cheers
> >>> Petr
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David
> >>>> Winsemius
> >>>> Sent: Friday, November 18, 2016 9:30 AM
> >>>> To: Karim Mezhoud <kmezhoud at gmail.com>
> >>>> Cc: r-help at r-project.org
> >>>> Subject: Re: [R] aggregate dataframe by multiple factors
> >>>>
> >>>>
> >>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezhoud at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Dear all,
> >>>>>
> >>>>> the dat  has missing values NA,
> >>>>>
> >>>>>   first.Name   Name Department  DCE   DP       date
> >>>>> 5      Auction Videos        YME 0.57 0.56 2013-09-30
> >>>>> 18       Amish  Wives        TAS 0.59 0.56 2013-09-30
> >>>>> 34     Ancient Nation        QLH 0.54 0.58 2013-09-30
> >>>>> 53     Auction Videos        YME   NA   NA 2013-12-28
> >>>>> 66       Amish  Wives        TAS   NA   NA 2013-12-28
> >>>>> 82     Ancient Nation        QLH 0.28 0.29 2013-12-28
> >>>>> 102    Auction Videos        YME 0.57 0.56 2014-03-30
> >>>>> 115      Amish  Wives        TAS 0.59 0.56 2014-03-30
> >>>>> 131    Ancient Nation        QLH 0.54 0.58 2014-03-30
> >>>>> 150    Auction Videos        YME   NA   NA 2014-06-28
> >>>>> 163      Amish  Wives        TAS   NA   NA 2014-06-28
> >>>>> 179    Ancient Nation        QLH 0.28 0.29 2014-06-28
> >>>>>
> >>>>>
> >>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> >>>>
> >>>> The closest I could get on a few attempts was:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> >>>> )
> >>>>
> >>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> >>>> 1 Ancient  Nation     QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> >>>> 2   Amish   Wives     TAS  0.59    NA  0.59    NA 0.56   NA 0.56   NA
> >>>> 3 Auction  Videos     YME  0.57    NA  0.57    NA 0.56   NA 0.56   NA
> >>>>
> >>>> I think the sort operation might be somewhat ambiguous in this
> instance. I
> >>>> tried:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> >>>> unlist(lapply(d,sort))}))
> >>>> )
> >>>>
> >>>> With no success, not even a sorted result.
> >>>>
> >>>> --
> >>>> David.
> >>>>>
> >>>>>
> >>>>> agg has list of value. I would separate value in different columns.
> >>>>>
> >>>>> Group.1 Group.2 Group.3                    DCE                     DP
> >>>>> 1 Ancient  Nation     QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58,
> 0.58
> >>>>> 2   Amish   Wives     TAS             0.59, 0.59             0.56,
> 0.56
> >>>>> 3 Auction  Videos     YME             0.57, 0.57             0.56,
> 0.56
> >>>>>
> >>>>> The  goal:
> >>>>>
> >>>>> Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2  DP.3
> DP.4
> >>>>> 1 Ancient  Nation     QLH    0.28     0.28    0.54     0.54
>  0.29, 0.29,
> >>>>> 0.58, 0.58
> >>>>> 2   Amish   Wives     TAS        NA     NA     0.59, 0.59
>  NA
> >>>>> NA  0.56, 0.56
> >>>>> 3 Auction  Videos     YME         NA   NA      0.57, 0.57
>  NA
> >>>>> NA  0.56, 0.56
> >>>>>
> >>>>>
> >>>>>
> >>>>> dat <- structure(list(first.Name = structure(c(3L, 1L, 2L, 3L, 1L,
> 2L,
> >>>>> 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("Amish", "Ancient", "Auction",
> >>>>> "Ax", "Bachelorette", "Basketball", "BBQ", "Cake", "Celebrity",
> >>>>> "Chef", "Clean", "Colonial", "Comedy", "Comic", "Crocodile", "Dog",
> >>>>> "Empire", "Extreme", "Farm", "Half Pint", "Hollywood", "House", "Ice
> >>>>> Road", "Jersey", "Justice", "Love", "Mega", "Model", "Modern",
> >>>>> "Mountain", "Mystery", "Myth", "New York", "Paradise", "Pioneer",
> >>>>> "Queer", "Restaurant", "Road", "Royal", "Spouse", "Star", "Storage",
> >>>>> "Survival", "The Great American", "Tool", "Treasure", "Wedding",
> >>>>> "Wife"), class = "factor"), Name = structure(c(43L, 47L, 29L, 43L,
> >>>>> 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L), .Label = c("Aliens",
> >>>>> "Behavior", "Casino", "Casting Call", "Challenge", "Contest",
> >>>>> "Crashers", "Crew", "Dad", "Dancing", "Date", "Disasters", "Dynasty",
> >>>>> "Family", "Garage", "Greenlight", "Gypsies", "Haul", "Hot Rod",
> >>>>> "Inventor", "Jail", "Job", "Justice", "Marvels", "Master", "Mates",
> >>>>> "Model", "Moms", "Nation", "Ninja", "Patrol", "People", "Pitmasters",
> >>>>> "Queens", "Rescue", "Rivals", "Room", "Rooms", "Rules", "Star",
> >>>>> "Stars", "Superhero", "Videos", "VIP", "Wars", "Wishes", "Wives",
> >>>>> "Wrangler"), class = "factor"), Department = structure(c(8L, 6L, 2L,
> >>>>> 8L, 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L), .Label = c("HXW", "QLH", "RAR",
> >>>>> "RYC", "SYI", "TAS", "VUV", "YME"), class = "factor"),
> >>>>>   DCE = c(0.57, 0.59, 0.54, NA, NA, 0.28, 0.57, 0.59, 0.54,
> >>>>>   NA, NA, 0.28), DP = c(0.56, 0.56, 0.58, NA, NA, 0.29, 0.56,
> >>>>>   0.56, 0.58, NA, NA, 0.29), date = structure(c(15978, 15978,
> >>>>>   15978, 16067, 16067, 16067, 16159, 16159, 16159, 16249, 16249,
> >>>>>   16249), class = "Date")), description = "", row.names = c(5L, 18L,
> >>>>> 34L, 53L, 66L, 82L, 102L, 115L, 131L, 150L, 163L, 179L), class =
> >>>>> "data.frame", .Names = c("first.Name", "Name", "Department", "DCE",
> >>>>> "DP", "date"))
> >>>>>
> >>>>>    [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>> David Winsemius
> >>>> Alameda, CA, USA
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-
> >>>> guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> ________________________________
> >>> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
> jsou určeny pouze jeho adresátům.
> >>> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> >>> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> >>> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
> modifikacemi či zpožděním přenosu e-mailu.
> >>>
> >>> V případě, že je tento e-mail součástí obchodního jednání:
> >>> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> >>> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze
> strany příjemce s dodatkem či odchylkou.
> >>> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> >>> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> existence je adresátovi či osobě jím zastoupené známá.
> >>>
> >>> This e-mail and any documents attached to it may be confidential and
> are intended only for its intended recipients.
> >>> If you received this e-mail by mistake, please immediately inform its
> sender. Delete the contents of this e-mail with all attachments and its
> copies from your system.
> >>> If you are not the intended recipient of this e-mail, you are not
> authorized to use, disseminate, copy or disclose this e-mail in any manner.
> >>> The sender of this e-mail shall not be liable for any possible damage
> caused by modifications of the e-mail or by delay with transfer of the
> email.
> >>>
> >>> In case that this e-mail forms part of business dealings:
> >>> - the sender reserves the right to end negotiations about entering
> into a contract in any time, for any reason, and without stating any
> reasoning.
> >>> - if the e-mail contains an offer, the recipient is entitled to
> immediately accept such offer; The sender of this e-mail (offer) excludes
> any acceptance of the offer on the part of the recipient containing any
> amendment or variation.
> >>> - the sender insists on that the respective contract is concluded only
> upon an express mutual agreement on all its aspects.
> >>> - the sender of this e-mail informs that he/she is not authorized to
> enter into any contracts on behalf of the company except for cases in which
> he/she is expressly authorized to do so in writing, and such authorization
> or power of attorney is submitted to the recipient or the person
> represented by the recipient, or the existence of such authorization is
> known to the recipient of the person represented by the recipient.
> >>>
> >>
> >> David Winsemius
> >> Alameda, CA, USA
> >>
> >>
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list