[R] Create new data frame with conditional sums

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Mon Oct 16 16:48:05 CEST 2023


If one makes the reasonable assumption that Pct is much larger than
Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n)  for
Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's
approach using findInterval may be faster. Of course implementation
details matter.

-- Bert

On Mon, Oct 16, 2023 at 4:41 AM Leonard Mada <leo.mada using syonic.eu> wrote:
>
> Dear Jason,
>
> The code could look something like:
>
> dummyData = data.frame(Tract=seq(1, 10, by=1),
>      Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),
>      Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800))
>
> # Define the cutoffs
> # - allow for duplicate entries;
> by = 0.03; # by = 0.01;
> cutoffs <- seq(0, 0.20, by = by)
>
> # Create a new column with cutoffs
> dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs,
>      labels = cutoffs[-1], ordered_result = TRUE)
>
> # Sort data
> # - we could actually order only the columns:
> #   Totpop & Cutoff;
> dummyData = dummyData[order(dummyData$Cutoff), ]
>
> # Result
> cs = cumsum(dummyData$Totpop)
>
> # Only last entry:
> # - I do not have a nice one-liner, but this should do it:
> isLast = rev(! duplicated(rev(dummyData$Cutoff)))
>
> data.frame(Total = cs[isLast],
>      Cutoff = dummyData$Cutoff[isLast])
>
>
> Sincerely,
>
> Leonard
>
>
> On 10/15/2023 7:41 PM, Leonard Mada wrote:
> > Dear Jason,
> >
> >
> > I do not think that the solution based on aggregate offered by GPT was
> > correct. That quasi-solution only aggregates for every individual level.
> >
> >
> > As I understand, you want the cumulative sum. The idea was proposed by
> > Bert; you need only to sort first based on the cutoff (e.g. using an
> > ordered factor). And then only extract the last value for each level.
> > If Pct is unique, than you can skip this last step and use directly
> > the cumsum (but on the sorted data set).
> >
> >
> > Alternatives: see the solutions with loops or with sapply.
> >
> >
> > Sincerely,
> >
> >
> > Leonard
> >
> >



More information about the R-help mailing list