[R] Create new data frame with conditional sums

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Mon Oct 16 17:07:51 CEST 2023


Sorry, misstatements. It should (of course) read:

If one makes the reasonable assumption that Pct is much larger than
Cutoff, sorting Pct is the expensive part e.g O(nlog2(n)  for
Quicksort (n = length Pct). I believe looping is O(n^2).
etc.

On Mon, Oct 16, 2023 at 7:48 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> If one makes the reasonable assumption that Pct is much larger than
> Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n)  for
> Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's
> approach using findInterval may be faster. Of course implementation
> details matter.
>
> -- Bert
>
> On Mon, Oct 16, 2023 at 4:41 AM Leonard Mada <leo.mada using syonic.eu> wrote:
> >
> > Dear Jason,
> >
> > The code could look something like:
> >
> > dummyData = data.frame(Tract=seq(1, 10, by=1),
> >      Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),
> >      Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800))
> >
> > # Define the cutoffs
> > # - allow for duplicate entries;
> > by = 0.03; # by = 0.01;
> > cutoffs <- seq(0, 0.20, by = by)
> >
> > # Create a new column with cutoffs
> > dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs,
> >      labels = cutoffs[-1], ordered_result = TRUE)
> >
> > # Sort data
> > # - we could actually order only the columns:
> > #   Totpop & Cutoff;
> > dummyData = dummyData[order(dummyData$Cutoff), ]
> >
> > # Result
> > cs = cumsum(dummyData$Totpop)
> >
> > # Only last entry:
> > # - I do not have a nice one-liner, but this should do it:
> > isLast = rev(! duplicated(rev(dummyData$Cutoff)))
> >
> > data.frame(Total = cs[isLast],
> >      Cutoff = dummyData$Cutoff[isLast])
> >
> >
> > Sincerely,
> >
> > Leonard
> >
> >
> > On 10/15/2023 7:41 PM, Leonard Mada wrote:
> > > Dear Jason,
> > >
> > >
> > > I do not think that the solution based on aggregate offered by GPT was
> > > correct. That quasi-solution only aggregates for every individual level.
> > >
> > >
> > > As I understand, you want the cumulative sum. The idea was proposed by
> > > Bert; you need only to sort first based on the cutoff (e.g. using an
> > > ordered factor). And then only extract the last value for each level.
> > > If Pct is unique, than you can skip this last step and use directly
> > > the cumsum (but on the sorted data set).
> > >
> > >
> > > Alternatives: see the solutions with loops or with sapply.
> > >
> > >
> > > Sincerely,
> > >
> > >
> > > Leonard
> > >
> > >



More information about the R-help mailing list