[R] Readjusting the OUTPUT csv file

Greg Snow Greg.Snow at imail.org
Wed Feb 10 19:40:56 CET 2010


Amelia,

I think your problem (or at least part of it) comes from the fact that in your learning of R syntax you learned about the magical shortcut index tool "$" without gaining an adequate appreciation and understanding of what that is a shortcut for (common misunderstanding).  The great thing about the magical shortcuts, is that when used as intended, they make life easier (thus being magical shortcuts), the drawback of magical shortcuts is that when used in ways never intended, the result is often the programming equivalent of accidentally turning yourself into a toad.

Don't worry though, the solution is simple, don't use the magical shortcut, but go back to the unmagical tool that it is a shortcut for "[[".

Try:

name1 <- read.csv('input1.csv')[[name1]]
value1 <- read.csv('input1.csv')[[value1]]

and see if that does what you intend.

Also note that in the above you are reading input1.csv twice, it is probably more efficient to read it once and store the data frame, then extract the information that you want and delete the data frame (I/O can be slow compared to in memory processes).

Hope this helps,


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Amelia Livington
> Sent: Wednesday, February 10, 2010 12:03 AM
> To: r-help at r-project.org
> Subject: [R] Readjusting the OUTPUT csv file
>
> Dear R helpers
>
> I have some variables say ABC, DEF, PQR, LMN and XYZ. I am choosing any
> three varaibles at random at a time for my analysis and name these
> files as input1.csv, input2.csv and input3.csv. So if I choose
> variables say ABC, DEF and PQR, I am passing the specifications of
> these variables to input1.csv, input2.csv and input3.csv respectively.
>
> This means in another case even if I choose say LMN, DEF and XYZ in
> this order, then my input1.csv will have details of LMN, input2.csv
> will have detaisl of DEF and input3.csv will have details of XYZ. Thus,
> I am keeping the (input) file names constant but the contents of each
> of these input files may change from one case to another case.
>
> As an example, my input files are as given below.
>
> input1.csv
> name1   value1
> DEF         10
>
> input2.csv
> name2   value2
> LMN         8
>
> input3.csv
> name3   value3
> PQR       7
>
> ## __________________________________________________________________
>
> ## The Problem
>
> name1 = read.csv('input1.csv')$name1
> value1 = read.csv('input1.csv')$value1
>
>
> name2 = read.csv('input2.csv')$name2
> value2 = read.csv('input2.csv')$value2
>
>
> name3 = read.csv('input3.csv')$name3
> value3 = read.csv('input3.csv')$value3
>
> # (Thus if 1st variable chosen is DEF, then name1 = DEF, value1 = 10.
> Had I chosen some otehr variable, then name1 and value1 will have
> different values).
>
> Then I carry out my analysis and get the output as given below.
>
> Scenario_and_range            names             values
> Sc_1 (DEF LMN PQR)         name1             10.04
> Sc_1 (DEF LMN PQR)         name2               8.21
> Sc_1 (DEF LMN PQR)         name3               7.23
>
> Sc_2 (DEF LMN PQR)         name1               9.96
> Sc_2 (DEF LMN PQR)         name2               8.47
> Sc_2 (DEF LMN PQR)         name3               7.40
> .......................................................................
> ..
> .......................................................................
> ..
>
> My probelm is I know name1 pertains to DEF, name2 pertains to LMN and
> name3 pertains to PQR. But in the output I am not able to replace name1
> with DEF, name2 with LMN and so on i.e. I cannot hardcode the varaible
> names in data.frame as name1 = DEF, name2 = LMN etc as these names are
> going to change frome case to case. My objective is to get the names
> (in column 2 of above table) same as respective names (appearing in 1st
> column) in its order i.e. instead of name1, I should get DEF, in place
> of name2, I should get LMN and in place of name3, I should get PQR.
> Hence, my output should be like -
>
>
> Scenario_and_range            names             values
> Sc_1 (DEF LMN PQR)         DEF                10.04
> Sc_1 (DEF LMN PQR)         LMN                  8.21
> Sc_1 (DEF LMN PQR)         PQR                  7.23
>
> Sc_2 (DEF LMN PQR)         DEF                   9.96
> Sc_2 (DEF LMN PQR)         LMN                   8.47
> Sc_2 (DEF LMN PQR)         PQR                   7.40
> .......................................................................
> ..
> .......................................................................
> ..
>
>
> I have tried to expalin the problem but I am not sure how far I have
> been clear in that. I am giving below the actual R code I have used to
> arrive at the required output. Actually I have say 8-9 rates and I am
> choosing say any three out of them. Each of these rates have three
> possible ranges with respective probabilities. My objective is
> to generate random numbers for the various possible range COMBINATIONS
> in proportion to the respective joint probabilities. I have alraedy
> written the R code, I just need to refine my output.
>
> I also sincerely apologize for writing such a long mail.
>
> Regards
>
> Amelia
>
> **** My R code
> ******************************************************************
>
> ## Actualy I am dealing with more than 3 variables i.e. rates and thus
> my names are different in actual code.
>
> ## Beginning of a R code
>
> ## FUNCTION NO. 3
>
> library(reshape)
> no_rate = 3
>
> combi_3 = function(n, N, rateA, rate_name1, rateA_rf1, rateA_rf2,
> rateA_rf3, rateAprob1, rateAprob2, rateAprob3,
>                            rateB, rate_name2, rateB_rf1, rateB_rf2,
> rateB_rf3, rateBprob1, rateBprob2, rateBprob3,
>          rateC, rate_name3, rateC_rf1, rateC_rf2, rateC_rf3,
> rateCprob1, rateCprob2, rateCprob3)
>
> {
>
> rateA_prob1 = rateAprob3/2
> rateA_prob2 = rateAprob2/2
> rateA_prob3 = rateAprob1
> rateA_prob4 = rateA_prob2
> rateA_prob5 = rateA_prob1
>
> rateA_ran1_min = rateA-rateA_rf3
> rateA_ran1_max = rateA-rateA_rf2
> rateA_ran2_min = rateA-rateA_rf2
> rateA_ran2_max = rateA-rateA_rf1
> rateA_ran3_min = rateA-rateA_rf1
> rateA_ran3_max = rateA+rateA_rf1
> rateA_ran4_min = rateA+rateA_rf1
> rateA_ran4_max = rateA+rateA_rf2
> rateA_ran5_min = rateA+rateA_rf2
> rateA_ran5_max = rateA+rateA_rf3
>
> rateB_prob1 = rateBprob3/2
> rateB_prob2 = rateBprob2/2
> rateB_prob3 = rateBprob1
> rateB_prob4 = rateB_prob2
> rateB_prob5 = rateB_prob1
>
> rateB_ran1_min = rateB-rateB_rf3
> rateB_ran1_max = rateB-rateB_rf2
> rateB_ran2_min = rateB-rateB_rf2
> rateB_ran2_max = rateB-rateB_rf1
> rateB_ran3_min = rateB-rateB_rf1
> rateB_ran3_max = rateB+rateB_rf1
> rateB_ran4_min = rateB+rateB_rf1
> rateB_ran4_max = rateB+rateB_rf2
> rateB_ran5_min = rateB+rateB_rf2
> rateB_ran5_max = rateB+rateB_rf3
>
> rateC_prob1 = rateCprob3/2
> rateC_prob2 = rateCprob2/2
> rateC_prob3 = rateCprob1
> rateC_prob4 = rateC_prob2
> rateC_prob5 = rateC_prob1
>
> rateC_ran1_min = rateC-rateC_rf3
> rateC_ran1_max = rateC-rateC_rf2
> rateC_ran2_min = rateC-rateC_rf2
> rateC_ran2_max = rateC-rateC_rf1
> rateC_ran3_min = rateC-rateC_rf1
> rateC_ran3_max = rateC+rateC_rf1
> rateC_ran4_min = rateC+rateC_rf1
> rateC_ran4_max = rateC+rateC_rf2
> rateC_ran5_min = rateC+rateC_rf2
> rateC_ran5_max = rateC+rateC_rf3
>
> ##
> _______________________________________________________________________
> _________________________
>
> ## COMPUTATIONS
>
> R11 = paste(rate_name1, 1, sep = "")
> R12 = paste(rate_name1, 2, sep = "")
> R13 = paste(rate_name1, 3, sep = "")
> R14 = paste(rate_name1, 4, sep = "")
> R15 = paste(rate_name1, 5, sep = "")
>
> R21 = paste(rate_name2, 1, sep = "")
> R22 = paste(rate_name2, 2, sep = "")
> R23 = paste(rate_name2, 3, sep = "")
> R24 = paste(rate_name2, 4, sep = "")
> R25 = paste(rate_name2, 5, sep = "")
>
> R31 = paste(rate_name3, 1, sep = "")
> R32 = paste(rate_name3, 2, sep = "")
> R33 = paste(rate_name3, 3, sep = "")
> R34 = paste(rate_name3, 4, sep = "")
> R35 = paste(rate_name3, 5, sep = "")
>
> data_lab <- expand.grid(c(R11, R12, R13, R14, R15), c(R21, R22, R23,
> R24, R25), c(R31, R32, R33, R34, R35))
> range_prob <- list()
>
> range_prob[[1]]   = c(rateA_prob1, rateA_prob2, rateA_prob3,
> rateA_prob4, rateA_prob5 )
> range_prob[[2]]   = c(rateB_prob1, rateB_prob2, rateB_prob3,
> rateB_prob4, rateB_prob5 )
> range_prob[[3]]   = c(rateC_prob1, rateC_prob2, rateC_prob3,
> rateC_prob4, rateC_prob5 )
>
> pdf = expand.grid(range_prob)
> data_lab$probs = apply(pdf, 1, prod)
> joint_probs = xtabs(probs ~ Var1 + Var2 + Var3, data = data_lab)
>
> write.csv(data.frame(joint_probs), 'joint_probs3.csv', row.names =
> FALSE)
>
> #
> _______________________________________________________________________
> __________________________
>
> ONS3 = read.csv('joint_probs3.csv')
> names3 = apply(ONS3[, -4], 1, paste, collapse = ' ')
>
> write.csv(data.frame(rate_combination = names3, Probability3 = ONS3[,
> 4]), 'prob_table3.csv', row.names = FALSE)
>
> ### ###### __________________________________________________
>
> no_of_instances = N*(read.csv('prob_table3.csv')$Probability3)
>
> write.csv(data.frame(rate1_range = ONS3[,1], rate2_range = ONS3[,2],
> rate3_range = ONS3[,3], no_of_instances), 'Final Table3.csv', row.names
> = FALSE)
>
> ### ###### __________________________________________________
>
> HM3 = read.csv('Final Table3.csv')
>
> rateA_rates=rep(c(R11,R12,R13,R14,R15),25)
> rateA_rates
>
> number = HM3$no_of_instances
>
> rateA_combination=rep(rateA_rates,number)
> rateA_combination
>
> interval.min=ifelse(rateA_combination==R11, rateA_ran1_min,
> ifelse(rateA_combination==R12, rateA_ran2_min,
> ifelse(rateA_combination==R13, rateA_ran3_min,
> ifelse(rateA_combination==R14, rateA_ran4_min, rateA_ran5_min))))
> interval.max=ifelse(rateA_combination==R11, rateA_ran1_max,
> ifelse(rateA_combination==R12, rateA_ran2_max,
> ifelse(rateA_combination==R13, rateA_ran3_max,
> ifelse(rateA_combination==R14, rateA_ran4_max, rateA_ran5_max))))
>
> rand.nums_rateA=runif(rateA_combination,min=interval.min,max=interval.m
> ax)
>
> rateA_series = data.frame(rateA_combination,rand.nums_rateA)
>
> write.csv(data.frame(rateA_series), 'rateA_series.csv', row.names =
> FALSE)
>
> ##
> _______________________________________________________________________
> _______________________________________________________________________
> __________________________________________________________
>
> ## Random number generation for rateB
>
> A2 = rep(c(R21), each = 5)
> B2 = rep(c(R22), each = 5)
> C2 = rep(c(R23), each = 5)
> D2 = rep(c(R24), each = 5)
> E2 = rep(c(R25), each = 5)
>
> rateB_rates=rep(c(A2, B2, C2, D2, E2), 5)
> rateB_rates
>
> rateB_combination=rep(rateB_rates,number)
> rateB_combination
>
> interval.min=ifelse(rateB_combination==R21, rateB_ran1_min,
> ifelse(rateB_combination==R22, rateB_ran2_min,
> ifelse(rateB_combination==R23, rateB_ran3_min,
> ifelse(rateB_combination==R24, rateB_ran4_min, rateB_ran5_min))))
> interval.max=ifelse(rateB_combination==R21, rateB_ran1_max,
> ifelse(rateB_combination==R22, rateB_ran2_max,
> ifelse(rateB_combination==R23, rateB_ran3_max,
> ifelse(rateB_combination==R24, rateB_ran4_max, rateB_ran5_max))))
>
> rand.nums_rateB=runif(rateB_combination,min=interval.min,max=interval.m
> ax)
>
> rateB_series = data.frame(rateB_combination,rand.nums_rateB)
>
> write.csv(data.frame(rateB_series), 'rateB_series.csv', row.names =
> FALSE)
>
> ##
> _______________________________________________________________________
> _______________________________________________________________________
> __________________________________________________________
>
> ## Random number generation for rateC
>
> rateC_rates=rep(c(R31,R32,R33, R34, R35), each = 25)
> rateC_rates
>
> rateC_combination=rep(rateC_rates,number)
> rateC_combination
>
> interval.min=ifelse(rateC_combination==R31, rateC_ran1_min,
> ifelse(rateC_combination==R32, rateC_ran2_min,
> ifelse(rateC_combination==R33, rateC_ran3_min,
> ifelse(rateC_combination==R34, rateC_ran4_min, rateC_ran5_min))))
> interval.max=ifelse(rateC_combination==R31, rateC_ran1_max,
> ifelse(rateC_combination==R32, rateC_ran2_max,
> ifelse(rateC_combination==R33, rateC_ran3_max,
> ifelse(rateC_combination==R34, rateC_ran4_max, rateC_ran5_max))))
>
> rand.nums_rateC=runif(rateC_combination,min=interval.min,max=interval.m
> ax)
>
> rateC_series = data.frame(rateC_combination,rand.nums_rateC)
>
> write.csv(data.frame(rateC_series), 'rateC_series.csv', row.names =
> FALSE)
>
> ####
> _______________________________________________________________________
> _______________________________________________________________________
> __
>
> ## Generation of combination names as per the no of random numbers
> generated
>
> PP3 = read.csv('prob_table3.csv')$rate_combination
> PPP3 = rep(PP3, number)
> return(data.frame(PPP3, rateA = rateA_series$rand.nums_rateA, rateB =
> rateB_series$rand.nums_rateB, rateC = rateC_series$rand.nums_rateC))
> }
>
>
>
>  if
>  (no_rate ==3)
>  {
>
>  output = combi_3(n = read.csv('number.csv')$n, N =
> read.csv('number.csv')$N, rateA = read.csv('rateA.csv')$rateA,
> rate_name1 = read.csv('rateA.csv')$rate_name, rateA_rf1 =
> read.csv('rateA_rf.csv')$rateA_rf1,
> rateA_rf2 = read.csv('rateA_rf.csv')$rateA_rf2, rateA_rf3 =
> read.csv('rateA_rf.csv')$rateA_rf3,
> rateAprob1 = read.csv('rateA_probability.csv')$probability1, rateAprob2
> = read.csv('rateA_probability.csv')$probability2, rateAprob3 =
> read.csv('rateA_probability.csv')$probability3,
> rateB = read.csv('rateB.csv')$rateB, rate_name2 =
> read.csv('rateB.csv')$rate_name, rateB_rf1 =
> read.csv('rateB_rf.csv')$rateB_rf1,
> rateB_rf2 = read.csv('rateB_rf.csv')$rateB_rf2, rateB_rf3 =
> read.csv('rateB_rf.csv')$rateB_rf3,
> rateBprob1 = read.csv('rateB_probability.csv')$probability1, rateBprob2
> = read.csv('rateB_probability.csv')$probability2, rateBprob3 =
> read.csv('rateB_probability.csv')$probability3,
> rateC = read.csv('rateC.csv')$rateC, rate_name3 =
> read.csv('rateC.csv')$rate_name, rateC_rf1 =
> read.csv('rateC_rf.csv')$rateC_rf1,
> rateC_rf2 = read.csv('rateC_rf.csv')$rateC_rf2, rateC_rf3 =
> read.csv('rateC_rf.csv')$rateC_rf3,
> rateCprob1 = read.csv('rateC_probability.csv')$probability1, rateCprob2
> = read.csv('rateC_probability.csv')$probability2, rateCprob3 =
> read.csv('rateC_probability.csv')$probability3)
>
> ##
> _______________________________________________________________________
> _______________________________________________________________________
> ____
>
> write.csv(data.frame(output), 'Rates Generated.csv', row.names =
> FALSE)
>
> om3 = read.csv('Rates Generated.csv')
> n = read.csv('number.csv')$n
> N = read.csv('number.csv')$N
> rateA = read.csv('rateA.csv')$rateA
> rate_name1 = read.csv('rateA.csv')$rate_name
> rateB = read.csv('rateB.csv')$rateB
> rate_name2 = read.csv('rateB.csv')$rate_name
> rateC = read.csv('rateC.csv')$rateC
> rate_name3 = read.csv('rateC.csv')$rate_name
>
> rateA_rf1 = read.csv('rateA_rf.csv')$rateA_rf1
> rateB_rf1 = read.csv('rateB_rf.csv')$rateB_rf1
> rateC_rf1 = read.csv('rateC_rf.csv')$rateC_rf1
>
> rateA3 = rateA-rateA_rf1
> rateA4 = rateA+rateA_rf1
>
> rateB3 = rateB-rateB_rf1
> rateB4 = rateB+rateB_rf1
>
> rateC3 = rateC-rateC_rf1
> rateC4 = rateC+rateC_rf1
>
> rate_gen_num = length(om3$rateA)    ### THIS rateA is from the scenario
> table.
>
> add_A = runif(N-rate_gen_num, rateA3, rateA4)
> add_B = runif(N-rate_gen_num, rateB3, rateB4)
> add_C = runif(N-rate_gen_num, rateC3, rateC4)
>
> if(rate_gen_num < N) rateA_adj = data.frame(namesA = paste(rate_name1,
> 3, sep = ""), adj_rateA = add_A) else rateA_adj = 0
> if(rate_gen_num < N) rateB_adj = data.frame(namesB = paste(rate_name2,
> 3, sep = ""), adj_rateB = add_B) else rateB_adj = 0
> if(rate_gen_num < N) rateC_adj = data.frame(namesC = paste(rate_name3,
> 3, sep = ""), adj_rateC = add_C) else rateC_adj = 0
>
> HM3 = read.csv('Final Table3.csv')
> number = HM3$no_of_instances
> PP3 = read.csv('prob_table3.csv')$rate_combination
> PPP3 = rep(PP3, number)
>
> combination_names = (c(as.character(PPP3), c(paste(rateA_adj$namesA,
> rateB_adj$namesB, rateC_adj$namesC, sep = " "))))
> adj_ratesA = c(read.csv('rateA_series.csv')$rand.nums_rateA,
> rateA_adj$adj_rateA)
> adj_ratesB = c(read.csv('rateB_series.csv')$rand.nums_rateB,
> rateB_adj$adj_rateB)
> adj_ratesC = c(read.csv('rateC_series.csv')$rand.nums_rateC,
> rateC_adj$adj_rateC)
>
> final_rates_3 = data.frame(rate_names = combination_names, ratesA =
> adj_ratesA, ratesB = adj_ratesB, ratesC = adj_ratesC)
>
> write.csv(data.frame(Scenario = paste("Sc_", 1:n, sep = ""),
> sort_df(final_rates_3,(1))), 'final_rates3.csv', row.names = FALSE)
>
> ONSPPA_3 = read.csv('final_rates3.csv')
>
> NP = paste(ONSPPA_3$Scenario, "(",ONSPPA_3$rate_names,")", sep = " ")
>
> write.csv(data.frame(Scenario_and_rate_range = NP,
> sort_df(final_rates_3,(1))), 'final_rates3_3.csv', row.names = FALSE)
>
> ONSPPA3 = read.csv('final_rates3_3.csv')
>
> SK3 = data.frame(Scenario_and_rate_range =
> ONSPPA3$Scenario_and_rate_range, rateA = ONSPPA3$ratesA, rateB =
> ONSPPA3$ratesB, rateC = ONSPPA3$ratesC )
>
> f3 <- stack(SK3)
>
> f3$Numbers <-rep(1:100,1)
>
> f3$Scenario <- rep(NP,1)
>
> f3
>
> f3 <- f3[order(f3$Numbers), ]
>
> write.csv(data.frame(f3), 'ksk3.csv', row.names = FALSE)
>
> ksv3 = read.csv('ksk3.csv')
>
> write.csv(data.frame(Scenarios_and_Rates_range = ksv3$Scenario, Rates =
> ksv3$ind, Rate_values = ksv3$values), 'SHM3.csv', row.names = FALSE)
>
>  }
>
> ### Input data
>
> 'number.csv'     # input file in csv form
> n           N
> 100      100
>
> #_ rateA input _______________________________________________
>
> 'rateA.csv'                   # These rate names and values will change
> from case to case.
> rate_name       rateA
> ABC                   8
>
> 'rateA_probability.csv'
> probability1 probability2 probability3  # These are various range
> probabilities here for ABC
>    0.6                0.3           0.1
>
> 'rateA_rf.csv'
> rateA_rf1  rateA_rf2  rateA_rf3
>     0.5          0.75            1
>
>
> # (These are possible range movements for rateA e.g. if rate A is ABC
> with base value of say 8, it will move (+ - 0.5 i.e. 7.5 - 8.5 with
> probability 0.60, + - 0.75 i.e. 7.25 - 8.75 with probability 0.3 and +
> - 1 from the base rate of 8  i.e. 7 - 9 with probability 0.1.
>
> # Thus we have five possible ranges in case of ABC as (7 - 7.25) with
> probability 0.05 i.e. (half of extreme movement probability of 0.1 i.e.
> third range), (7.25 - 7.5) with probability 0.15, (7.5 - 8.5) with
> probability 0.60, (8.5 - 8.75) with probability 0.15 and (8.75 - 9)
> with probability 0.05.
>
> #_______ rateB
> __________________________________________________________
>
>
> 'rateB.csv'                   # These rate names and values will change
> from case to case.
> rate_name       rateA
> DEF                   9.5
>
>
> 'rateB_probability.csv'
> probability1 probability2 probability3  # These are various range
> probabilities here for DEF
>    0.7                0.3           0
>
>
>
> 'rateB_rf.csv'
> rateB_rf1  rateB_rf2  rateB_rf3
>     1           1.5             0
>
>
> #_______ rateC
> __________________________________________________________
>
> 'rateC.csv'                   # These rate names and values will change
> from case to case.
> rate_name       rateC
> PQR                   5
>
>
> 'rateC_probability.csv'
> probability1 probability2 probability3  # These are various range
> probabilities here for PQR
>    0.5               0.3           0.2
>
>
> 'rateC_rf.csv'
> rateC_rf1  rateC_rf2  rateC_rf3
>     0.5           1             1.5
>
> ### End of input files.
>
>
>
>       [[alternative HTML version deleted]]



More information about the R-help mailing list