[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Fri Dec 20 22:33:33 CET 2019

Hello Jim , 

Thank you ever so much for your help. I was truly stuck! 

This looks much better and yes I can turn them into a matrix no problem. Indeed I need only the results for ER+ETR_H1,PGA and ER+ETR_H2,Sa. One minor point as it is the VC has 4 values for three cases instead of the aforementioned two. In fact, the third is identical to the first. Could you please optimize? 

Thank you very much again, 
Best, 
ioanna

-----Original Message-----
From: Jim Lemon [mailto:drjimlemon using gmail.com] 
Sent: Friday, December 20, 2019 9:04 PM
To: Ioannou, Ioanna <ioanna.ioannou using ucl.ac.uk>
Cc: r-help mailing list <r-help using r-project.org>
Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hi Ioanna,
We're getting somewhere, but there are four unique combinations  of Taxonomy and IM.type:

ER+ETR_H1,PGA
ER+ETR_H2,PGA
ER+ETR_H1,Sa
ER+ETR_H2,Sa

Perhaps you mean that ER+ETR_H1 only occurs with PGA and ER+ETR_H2 only occurs with Sa. I handled that by checking that there were any rows that corresponded to the condition requested.

Also you want a matrix for each row containing Taxonomy and IM.type in the output. When I run what I think you are asking, I only get a two element list, each a vector of values. Maybe this is what you want, and it could be coerced into matrix format:

D<- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 =
  c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 =
  c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)

# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states DS1_rows <-D$Damage.state == "DS1"
DS2_rows <-D$Damage.state == "DS2"
DS3_rows <-D$Damage.state == "DS3"
DS4_rows <-D$Damage.state == "DS4"
# create an empty list
VC<-list()
# set an index variable for VC
VCindex<-1
# step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows <- D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   VC[[VCindex]] <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +
    0.02* (D[calc_rows & DS1_rows,calc_vars] -
               D[calc_rows & DS2_rows,calc_vars]) +
    0.10* (D[calc_rows & DS2_rows,calc_vars] -
                                   D[calc_rows & DS3_rows,calc_vars]) +
    0.43 * (D[calc_rows & DS3_rows,calc_vars] -
                                   D[calc_rows & DS4_rows,calc_vars]) +
    1.0*   D[calc_rows & DS4_rows,calc_vars]
   # increment the index
   VCindex<-VCindex+1
  }
 }
}

I think we'll get there.

Jim

On Sat, Dec 21, 2019 at 12:45 AM Ioannou, Ioanna <ioanna.ioannou using ucl.ac.uk> wrote:
>
> Hello Jim,
>
> I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution?