[R] R-code: for those who like a challenge; HELP

Tue Aug 19 14:55:56 CEST 2008

I'll try to be more clear, but will have to give much more info. Here we go....

As part of my Masters in Public Health I writing a dissertation reviewing the mortality trend of internally displaced persons in camp settings in Sub Saharan Africa. I have 50 surveys with
- crude mortality rate (cmr), CMR lower confidence interval(cmrlci), CMR upper confidence interval(cmruci),
- recall period minimum and recall period maximum (recallmin, recallmax)
- time of displacement mimimum and time of displacement maximum (todmin, todmax)
- population data at time of survey and proportion of population present in camp over time of displacement (data[9] - data[170])

Because of the uncertainty in these variables I am using the Monte Carlo method with e.g. 10,000 iterations and am writing a function that can run through each of the 50 studies 10,000 times. The aim is to analyse the trend in IDP mortality in relation to time of displacement. I'm aiming at a graph with x=time, y=CMR and an attached weight for each coordinate depending on the size of the population it represents at randomly choosen time of displacement.

I have the following code that gives me problems and I hope one of you is able to help me find what I do wrong.

----------------------------------------------------------------------------------------------------------------------------------------------------------

### 1 setting iterations
iterations<-10000

### 2 creating results array
results<-array(0,dim=c(3,2500))

### 3 reading in survey dataset
data<-read.csv("DATA2.csv",header=TRUE)

### 4 FUNCTION THAT GENERATES x y AND weight FOR ONE SURVEY
f1<-function(data){

### 5 Reading in important variables from data
cmr<-data[[1]]
cmrlci<-data[[2]]
cmruci<-data[[3]]
recallmin<-data[[4]]
recallmax<-data[[5]]
todmin<-data[[6]]
todmax<-data[[7]]
pop<-data[[8]]

### 6 Generating random CMR or y coordinate (from a triangular distribution)
y<-0
rand<-runif(n=1,min=0,max=1)
prob<-((cmr-cmrlci)^2)/((cmruci-cmrlci)*(cmr-cmrlci))
if (rand<=prob) y<-sqrt(rand*(cmruci-cmrlci)*(cmr-cmrlci)) + cmrlci
if (rand>prob) y<-cmruci-sqrt((1-rand)*(cmruci-cmrlci)*(cmruci-cmr))

### 7 rounding y to 1 sign digit and then multiplying by ten so as to get a whole number per 100,000 per day (easier for array)
y<-round(y,digits=1)*10

### 8 Generating random time or x coordinate

### 9 First generating random time within recall period
randomrecall<-round(runif(n=1,min=recallmin,max=recallmax))

### 10 Second generating random time of displacement (this
todrange<-todmax-todmin
proprand<-runif(n=1,min=0,max=1)
randomtod<-0
for(i in 1:todrange){
if (proprand>data[[9+i]] & proprand<=data[[10+i]]) randomtod=i
}

### 11 Thus
x<-randomtod-randomrecall

### 12 Generating population weight
weight<-round(pop*data[[9+randomtod]])/100000

### 13 Function returns a vector onesurvey consisting of three values
onesurvey<-c(x,y,weight)
invisible(onesurvey)
}

### 14 Applying f1 to each row of the survey dataset, for the set number of iterations
replicate(iterations,{

### 15 Applying f1 to each row of the survey dataset, one iteration
### this should return a list allsurveys with three cols and as many rows as there are surveys
allsurveys<-apply(data,1,FUN=f1)

### 16 safer to convert list into a dataframe
allsurveys<-as.data.frame(allsurveys)

### 17 Updating "results" array with random values from each survey, after one iteration
### loop that picks up each row of allsurveys and updates the array with values from that row
for(i in 1:length(allsurveys[[1]])){
x<-allsurveys[[1]][i]
y<-allsurveys[[2]][i]
weight<-allsurveys[[3]][i]
results[x,y]<-results[x,y]+weight
}
results<<-results
}
)

----------------------------------------------------------------------------------------------------------------------
If I go through the code step by step from only 1-13, I run into the following problems with R warnings and errors.

> y<-0
> rand<-runif(n=1,min=0,max=1)
> prob<-((cmr-cmrlci)^2)/((cmruci-cmrlci)*(cmr-cmrlci))
> if (rand<=prob) y<-sqrt(rand*(cmruci-cmrlci)*(cmr-cmrlci)) + cmrlci
Warning message:
In if (rand <= prob) y <- sqrt(rand * (cmruci - cmrlci) * (cmr -  :
  the condition has length> 1 and only the first element will be used

> if (rand>prob) y<-cmruci-sqrt((1-rand)*(cmruci-cmrlci)*(cmruci-cmr))
Warning message:
In if (rand> prob) y <- cmruci - sqrt((1 - rand) * (cmruci - cmrlci) *  :
  the condition has length> 1 and only the first element will be used

y<-round(y,digits=1)*10
> y
 [1] 32 16 13 13 44 30  9  7  8  4  5  9 23  6  5  6  8  7 12 15  8 20 39  9  9 12  9  9  7  4 58 13  8 30  8 12  8 13 30  9  7  7 24  8  9 12 13 19 19 12

? What is the effect of the warning message? What do they mean and how can I prevent them? It does not seem to influence the calculation
----------------------------------------------------------------------------------------------------------------------
> randomrecall<-round(runif(n=1,min=recallmin,max=recallmax))
> randomrecall
[1] 2

> todrange<-todmax-todmin
> proprand<-runif(n=1,min=0,max=1)
> randomtod<-0
> for(i in 1:todrange){
+ if (proprand>data[[9+i]] & proprand<=data[[10+i]]) randomtod=i
+ }
Warning messages:
1: In 1:todrange :
  numerical expression has 50 elements: only the first used
2: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
3: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
4: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
5: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
6: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
7: In if (proprand> data[[9 + i]] & proprand <= data[[10 + i]]) randomtod = i :
  the condition has length> 1 and only the first element will be used
> randomtod
[1] 2

x<-randomtod-randomrecall
> x
[1] 0
--------------------------------------------------------------------------------------------------------------------------
> weight
 [1] 0.01020 0.01062 0.01017 0.01008 0.15682 0.77000 0.03653 0.03653 0.03653 0.01496 0.02400 0.01280 0.42570 0.42500 0.42600 0.42429 0.42150 0.42052 0.50900
[20] 0.53568 0.38833 0.21120 0.22834 0.18000 0.16156 0.17335 0.18249 0.82023 0.01260 0.01166 0.32120 0.22144 0.32500 0.89735 0.68904 0.34454 0.33854 0.11520
[39] 0.32540 0.00185 0.00144 0.00155 0.07588 0.11415 0.09081 0.09252 0.15926 0.03101 0.06370 0.08780
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

> onesurvey<-c(x,y,weight)
> invisible(onesurvey)
> onesurvey
  [1]  0.00000 32.00000 16.00000 13.00000 13.00000 44.00000 30.00000  9.00000  7.00000  8.00000  4.00000  5.00000  9.00000 23.00000  6.00000  5.00000
 [17]  6.00000  8.00000  7.00000 12.00000 15.00000  8.00000 20.00000 39.00000  9.00000  9.00000 12.00000  9.00000  9.00000  7.00000  4.00000 58.00000
 [33] 13.00000  8.00000 30.00000  8.00000 12.00000  8.00000 13.00000 30.00000  9.00000  7.00000  7.00000 24.00000  8.00000  9.00000 12.00000 13.00000
 [49] 19.00000 19.00000 12.00000  0.01020  0.01062  0.01017  0.01008  0.15682  0.77000  0.03653  0.03653  0.03653  0.01496  0.02400  0.01280  0.42570
 [65]  0.42500  0.42600  0.42429  0.42150  0.42052  0.50900  0.53568  0.38833  0.21120  0.22834  0.18000  0.16156  0.17335  0.18249  0.82023  0.01260
 [81]  0.01166  0.32120  0.22144  0.32500  0.89735  0.68904  0.34454  0.33854  0.11520  0.32540  0.00185  0.00144  0.00155  0.07588  0.11415  0.09081
 [97]  0.09252  0.15926  0.03101  0.06370  0.08780

This vector contains 1 x-value, 50 y-values and 50 weights. And that is not what I had in mind. How can I get the results of 1 survey? I know you can only read the first column, but that will not help if I am extending the function to all surveys/rows.

Thank you very very much for taking the time to look into this. It might be something simple, but for a non-R wonder this is impossible to figure out.

Best, Akko