[R] aggregate dataframe by multiple factors

Karim Mezhoud kmezhoud at gmail.com
Sun Nov 20 22:43:19 CET 2016


Sorry,
"That" and not "What".
Best

On Sun, Nov 20, 2016 at 9:00 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Nov 20, 2016, at 11:26 AM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
> >
> > Many Thanks,
> > What is exactly the solution.
>
> Is that a question? If so, you need to be more expansive about the areas
> of uncertainty, since I _thought_ I had given an "exact" solution.
>
> --
>
> David
> > Best,
> > Karim
> >
> > On Sun, Nov 20, 2016 at 8:17 PM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> > > On Nov 20, 2016, at 11:11 AM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> > >
> > >
> > >> On Nov 20, 2016, at 10:49 AM, Karim Mezhoud <kmezhoud at gmail.com>
> wrote:
> > >>
> > >> Yes the results does not have a date but
> > >> successive DCE.1, DCE2, DCE.3 indicates
> > >> DCE.date1 , DCE.date2, DCE.date3
> > >> I hope that the chronological order of date is conserved.
> > >
> > > There are, however, 4 dates. See this pair of results. You can
> probably do something if you ever figure out what it is that you precisely
> want. I think the date ordering is automatic here:
> > >
> > >> require(reshape2)
> > > Loading required package: reshape2
> > >
> > >> dcast(dat, first.Name +  Name + Department ~ date, value.var='DCE')
> > >  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30
> 2014-06-28
> > > 1      Amish  Wives        TAS       0.59         NA       0.59
>  NA
> > > 2    Ancient Nation        QLH       0.54       0.28       0.54
>  0.28
> > > 3    Auction Videos        YME       0.57         NA       0.57
>  NA
> > >> dcast(dat, first.Name +  Name + Department ~ date, value.var='DP')
> > >  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30
> 2014-06-28
> > > 1      Amish  Wives        TAS       0.56         NA       0.56
>  NA
> > > 2    Ancient Nation        QLH       0.58       0.29       0.58
>  0.29
> > > 3    Auction Videos        YME       0.56         NA       0.56
>  NA
> >
> > Yjos completes the process in the reshape2 world:
> >
> > > mdat <- melt(dat, measure.vars=c("DCE", 'DP') )
> > > str(mdat)
> > 'data.frame':   24 obs. of  6 variables:
> >  $ first.Name: Factor w/ 48 levels "Amish","Ancient",..: 3 1 2 3 1 2 3 1
> 2 3 ...
> >  $ Name      : Factor w/ 48 levels "Aliens","Behavior",..: 43 47 29 43
> 47 29 43 47 29 43 ...
> >  $ Department: Factor w/ 8 levels "HXW","QLH","RAR",..: 8 6 2 8 6 2 8 6
> 2 8 ...
> >  $ date      : Date, format: "2013-09-30" "2013-09-30" ...
> >  $ variable  : Factor w/ 2 levels "DCE","DP": 1 1 1 1 1 1 1 1 1 1 ...
> >  $ value     : num  0.57 0.59 0.54 NA NA 0.28 0.57 0.59 0.54 NA ...
> >
> > > dcast(mdat, first.Name +  Name + Department ~ date+variable )
> >   first.Name   Name Department 2013-09-30_DCE 2013-09-30_DP
> 2013-12-28_DCE
> > 1      Amish  Wives        TAS           0.59          0.56
>  NA
> > 2    Ancient Nation        QLH           0.54          0.58
>  0.28
> > 3    Auction Videos        YME           0.57          0.56
>  NA
> >   2013-12-28_DP 2014-03-30_DCE 2014-03-30_DP 2014-06-28_DCE 2014-06-28_DP
> > 1            NA           0.59          0.56             NA            NA
> > 2          0.29           0.54          0.58           0.28          0.29
> > 3            NA           0.57          0.56             NA            NA
> >
> > --
> > David.
> >
> > >
> > >
> > >
> > >> Thanks,
> > >> Karim
> > >>
> > >>
> > >> On Sun, Nov 20, 2016 at 7:44 PM, David Winsemius <
> dwinsemius at comcast.net> wrote:
> > >>
> > >>> On Nov 20, 2016, at 5:28 AM, Karim Mezhoud <kmezhoud at gmail.com>
> wrote:
> > >>>
> > >>> Sorry for the delay,
> > >>> Many Thanks for Mr. David and Mr. Petr
> > >>> I thinked  to use "sort" function to arrange chronologically  value
> by  'date' (without 'date' is colnames) of each variables (DCE, DP).
> > >>>
> > >>>
> > >>> The solution of David seems to be simple to understand with "unlist"
> function.
> > >>> The solution of Petr seems to be fancy. I did not find document
> about "I" argument for aggregate function.
> > >>>
> > >>> How can know which value for which date?
> > >>
> > >> You asked for a functional reshaping that did not have the date. Now
> you want the date? There was no date in the result you indicated was
> desired.   ....??????
> > >>
> > >> --
> > >> David.
> > >>
> > >>>
> > >>> I will save the reshaping/ordering dataframe  for later use.
> > >>> Many Thanks,
> > >>> Karim
> > >>>
> > >>>
> > >>> On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pikal at precheza.cz>
> wrote:
> > >>> Hi
> > >>>
> > >>> same result can be achieved by
> > >>>
> > >>> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
> > >>>
> > >>> Sorting according to the first row seems to be quite tricky. You
> could probably get closer by using some combination of split and order and
> arranging back chunks  of data
> > >>>
> > >>> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> > >>> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name,
> dat$Name, dat$Department, drop=T)), rbind))[ooo1,]
> > >>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> > >>> 2               0.28              NA                 NA
> > >>> 4               0.28              NA                 NA
> > >>> 1               0.54            0.59               0.57
> > >>> 3               0.54            0.59               0.57
> > >>>
> > >>> however I wonder why the order according to the first row is
> necessary if all NAs are on correct positions?
> > >>>
> > >>> Cheers
> > >>> Petr
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
> David
> > >>>> Winsemius
> > >>>> Sent: Friday, November 18, 2016 9:30 AM
> > >>>> To: Karim Mezhoud <kmezhoud at gmail.com>
> > >>>> Cc: r-help at r-project.org
> > >>>> Subject: Re: [R] aggregate dataframe by multiple factors
> > >>>>
> > >>>>
> > >>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezhoud at gmail.com>
> > >>>> wrote:
> > >>>>>
> > >>>>> Dear all,
> > >>>>>
> > >>>>> the dat  has missing values NA,
> > >>>>>
> > >>>>>   first.Name   Name Department  DCE   DP       date
> > >>>>> 5      Auction Videos        YME 0.57 0.56 2013-09-30
> > >>>>> 18       Amish  Wives        TAS 0.59 0.56 2013-09-30
> > >>>>> 34     Ancient Nation        QLH 0.54 0.58 2013-09-30
> > >>>>> 53     Auction Videos        YME   NA   NA 2013-12-28
> > >>>>> 66       Amish  Wives        TAS   NA   NA 2013-12-28
> > >>>>> 82     Ancient Nation        QLH 0.28 0.29 2013-12-28
> > >>>>> 102    Auction Videos        YME 0.57 0.56 2014-03-30
> > >>>>> 115      Amish  Wives        TAS 0.59 0.56 2014-03-30
> > >>>>> 131    Ancient Nation        QLH 0.54 0.58 2014-03-30
> > >>>>> 150    Auction Videos        YME   NA   NA 2014-06-28
> > >>>>> 163      Amish  Wives        TAS   NA   NA 2014-06-28
> > >>>>> 179    Ancient Nation        QLH 0.28 0.29 2014-06-28
> > >>>>>
> > >>>>>
> > >>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > >>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> > >>>>
> > >>>> The closest I could get on a few attempts was:
> > >>>>
> > >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> > >>>> )
> > >>>>
> > >>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> > >>>> 1 Ancient  Nation     QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58
> 0.29
> > >>>> 2   Amish   Wives     TAS  0.59    NA  0.59    NA 0.56   NA 0.56
>  NA
> > >>>> 3 Auction  Videos     YME  0.57    NA  0.57    NA 0.56   NA 0.56
>  NA
> > >>>>
> > >>>> I think the sort operation might be somewhat ambiguous in this
> instance. I
> > >>>> tried:
> > >>>>
> > >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> > >>>> unlist(lapply(d,sort))}))
> > >>>> )
> > >>>>
> > >>>> With no success, not even a sorted result.
> > >>>>
> > >>>> --
> > >>>> David.
> > >>>>>
> > >>>>>
> > >>>>> agg has list of value. I would separate value in different columns.
> > >>>>>
> > >>>>> Group.1 Group.2 Group.3                    DCE
>  DP
> > >>>>> 1 Ancient  Nation     QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58,
> 0.58
> > >>>>> 2   Amish   Wives     TAS             0.59, 0.59             0.56,
> 0.56
> > >>>>> 3 Auction  Videos     YME             0.57, 0.57             0.56,
> 0.56
> > >>>>>
> > >>>>> The  goal:
> > >>>>>
> > >>>>> Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2
> DP.3  DP.4
> > >>>>> 1 Ancient  Nation     QLH    0.28     0.28    0.54     0.54
>  0.29, 0.29,
> > >>>>> 0.58, 0.58
> > >>>>> 2   Amish   Wives     TAS        NA     NA     0.59, 0.59
>  NA
> > >>>>> NA  0.56, 0.56
> > >>>>> 3 Auction  Videos     YME         NA   NA      0.57, 0.57
>    NA
> > >>>>> NA  0.56, 0.56
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> dat <- structure(list(first.Name = structure(c(3L, 1L, 2L, 3L, 1L,
> 2L,
> > >>>>> 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("Amish", "Ancient", "Auction",
> > >>>>> "Ax", "Bachelorette", "Basketball", "BBQ", "Cake", "Celebrity",
> > >>>>> "Chef", "Clean", "Colonial", "Comedy", "Comic", "Crocodile", "Dog",
> > >>>>> "Empire", "Extreme", "Farm", "Half Pint", "Hollywood", "House",
> "Ice
> > >>>>> Road", "Jersey", "Justice", "Love", "Mega", "Model", "Modern",
> > >>>>> "Mountain", "Mystery", "Myth", "New York", "Paradise", "Pioneer",
> > >>>>> "Queer", "Restaurant", "Road", "Royal", "Spouse", "Star",
> "Storage",
> > >>>>> "Survival", "The Great American", "Tool", "Treasure", "Wedding",
> > >>>>> "Wife"), class = "factor"), Name = structure(c(43L, 47L, 29L, 43L,
> > >>>>> 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L), .Label = c("Aliens",
> > >>>>> "Behavior", "Casino", "Casting Call", "Challenge", "Contest",
> > >>>>> "Crashers", "Crew", "Dad", "Dancing", "Date", "Disasters",
> "Dynasty",
> > >>>>> "Family", "Garage", "Greenlight", "Gypsies", "Haul", "Hot Rod",
> > >>>>> "Inventor", "Jail", "Job", "Justice", "Marvels", "Master", "Mates",
> > >>>>> "Model", "Moms", "Nation", "Ninja", "Patrol", "People",
> "Pitmasters",
> > >>>>> "Queens", "Rescue", "Rivals", "Room", "Rooms", "Rules", "Star",
> > >>>>> "Stars", "Superhero", "Videos", "VIP", "Wars", "Wishes", "Wives",
> > >>>>> "Wrangler"), class = "factor"), Department = structure(c(8L, 6L,
> 2L,
> > >>>>> 8L, 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L), .Label = c("HXW", "QLH",
> "RAR",
> > >>>>> "RYC", "SYI", "TAS", "VUV", "YME"), class = "factor"),
> > >>>>>   DCE = c(0.57, 0.59, 0.54, NA, NA, 0.28, 0.57, 0.59, 0.54,
> > >>>>>   NA, NA, 0.28), DP = c(0.56, 0.56, 0.58, NA, NA, 0.29, 0.56,
> > >>>>>   0.56, 0.58, NA, NA, 0.29), date = structure(c(15978, 15978,
> > >>>>>   15978, 16067, 16067, 16067, 16159, 16159, 16159, 16249, 16249,
> > >>>>>   16249), class = "Date")), description = "", row.names = c(5L,
> 18L,
> > >>>>> 34L, 53L, 66L, 82L, 102L, 115L, 131L, 150L, 163L, 179L), class =
> > >>>>> "data.frame", .Names = c("first.Name", "Name", "Department", "DCE",
> > >>>>> "DP", "date"))
> > >>>>>
> > >>>>>    [[alternative HTML version deleted]]
> > >>>>>
> > >>>>> ______________________________________________
> > >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>> PLEASE do read the posting guide
> > >>>>> http://www.R-project.org/posting-guide.html
> > >>>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>>
> > >>>> David Winsemius
> > >>>> Alameda, CA, USA
> > >>>>
> > >>>> ______________________________________________
> > >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>> PLEASE do read the posting guide http://www.R-project.org/posting-
> > >>>> guide.html
> > >>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>> ________________________________
> > >>> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
> jsou určeny pouze jeho adresátům.
> > >>> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> > >>> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > >>> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
> modifikacemi či zpožděním přenosu e-mailu.
> > >>>
> > >>> V případě, že je tento e-mail součástí obchodního jednání:
> > >>> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > >>> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze
> strany příjemce s dodatkem či odchylkou.
> > >>> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> > >>> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> existence je adresátovi či osobě jím zastoupené známá.
> > >>>
> > >>> This e-mail and any documents attached to it may be confidential and
> are intended only for its intended recipients.
> > >>> If you received this e-mail by mistake, please immediately inform
> its sender. Delete the contents of this e-mail with all attachments and its
> copies from your system.
> > >>> If you are not the intended recipient of this e-mail, you are not
> authorized to use, disseminate, copy or disclose this e-mail in any manner.
> > >>> The sender of this e-mail shall not be liable for any possible
> damage caused by modifications of the e-mail or by delay with transfer of
> the email.
> > >>>
> > >>> In case that this e-mail forms part of business dealings:
> > >>> - the sender reserves the right to end negotiations about entering
> into a contract in any time, for any reason, and without stating any
> reasoning.
> > >>> - if the e-mail contains an offer, the recipient is entitled to
> immediately accept such offer; The sender of this e-mail (offer) excludes
> any acceptance of the offer on the part of the recipient containing any
> amendment or variation.
> > >>> - the sender insists on that the respective contract is concluded
> only upon an express mutual agreement on all its aspects.
> > >>> - the sender of this e-mail informs that he/she is not authorized to
> enter into any contracts on behalf of the company except for cases in which
> he/she is expressly authorized to do so in writing, and such authorization
> or power of attorney is submitted to the recipient or the person
> represented by the recipient, or the existence of such authorization is
> known to the recipient of the person represented by the recipient.
> > >>>
> > >>
> > >> David Winsemius
> > >> Alameda, CA, USA
> > >>
> > >>
> > >
> > > David Winsemius
> > > Alameda, CA, USA
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list