[R] Create new data frame with conditional sums

Jason Stout, M.D. j@@on@@tout @end|ng |rom duke@edu
Sat Oct 14 13:57:48 CEST 2023


After I sent this, a colleague referred me to the GPT-4 interface on Bing.  I entered the exact email query below and it provided the following solution, which worked for the toy example and was successfully adapted to my application:

# Define the cutoffs
cutoffs <- seq(0, 0.15, by = 0.01)

# Create a new column with cutoffs
dummydata$Cutoff <- cut(dummydata$Pct, breaks = cutoffs, labels = cutoffs[-1])

# Aggregate Totpop by Cutoff
result <- aggregate(Totpop ~ Cutoff, data = dummydata, sum)

# Rename the columns
names(result) <- c("Cutoff", "Pop")

# Print the result
result

So thanks to all for considering this query�we're in a brave new world of AI-generated coding.

Message: 3
Date: Fri, 13 Oct 2023 20:13:56 +0000
From: "Jason Stout, M.D." <jason.stout using duke.edu>
To: "r-help using r-project.org" <r-help using r-project.org>
Subject: [R] Create new data frame with conditional sums
Message-ID:
        <BN8PR05MB60663B9D27B40E8C240A387981D2A using BN8PR05MB6066.namprd05.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

This seems like it should be simple but I can't get it to work properly.  I'm starting with a data frame like this:

Tract      Pct          Totpop
1              0.05        4000
2              0.03        3500
3              0.01        4500
4              0.12        4100
5              0.21        3900
6              0.04        4250
7              0.07        5100
8              0.09        4700
9              0.06        4950
10           0.03        4800

And I want to end up with a data frame with two columns, a "Cutoff" column that is a simple sequence of equally spaced cutoffs (let's say in this case from 0-0.15 by 0.01) and a "Pop" column which equals the sum of "Totpop" in the prior data frame in which "Pct" is greater than or equal to "cutoff."  So in this toy example, this is what I want for a result:

   Cutoff   Pop
1    0.00 43800
2    0.01 43800
3    0.02 39300
4    0.03 39300
5    0.04 31000
6    0.05 26750
7    0.06 22750
8    0.07 17800
9    0.08 12700
10   0.09 12700
11   0.10  8000
12   0.11  8000
13   0.12  8000
14   0.13  3900
15   0.14  3900
16   0.15  3900

I can do this with a for loop but it seems there should be an easier, vectorized way that would be more efficient.  Here is a reproducible example:

dummydata<-data.frame(Tract=seq(1,10,by=1),Pct=c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),Totpop=c(4000,3500,4500,4100,
                                                                                                             3900,4250,5100,4700,
                                                                                                             4950,4800))
dfrm<-data.frame(matrix(ncol=2,nrow=0,dimnames=list(NULL,c("Cutoff","Pop"))))
for (i in seq(0,0.15,by=0.01)) {
 temp<-sum(dummydata[dummydata$Pct>=i,"Totpop"])
dfrm[nrow(dfrm)+1,]<-c(i,temp)
}

Jason Stout, MD, MHS
Box 102359-DUMC
Durham, NC 27710
FAX 919-681-7494

	[[alternative HTML version deleted]]



More information about the R-help mailing list