[R] loops & sampling

Julian Burgos jmburgos at u.washington.edu
Thu Nov 1 20:26:12 CET 2007


Hi Garth,

Your code is really confusing! You should start by reading the help file 
on the for() function and understanding what it does:

?"for"

Your line
for(i in 1:nboot){

}

is simply starting a loop around the variable 'i', which will change 
values following the sequence 1:nboot.

It seems that the problem (or part of it) is that your are calling the 
sample() function using a 'n' variable that is not defined anywhere.

Also, what nboot is supposed to be?  The numbers of samples to be taken 
(10, 20, etc.) or the number of iterations (1000).  In your example, you 
are calling your function as

bt.cor <- npboot.function(nboot=10)

so in this case your function will loop around 10 times.

Here is a function that will do what you want:

npboot.function <- function(data,nboot){
boot.cor <- vector(length=1000)
for (i in 1:1000){
abc2=data[-(1:nboot),] #Remove the first 'nboot' rows
my.sample=sample(1:(250-nboot),nboot,replace=T) # Sample rows
abc2=rbind(abc2,abc2[my.sample,]) # Add the sampled rows to the 
truncated dataset
model <- lm(asin(sqrt(abc2$y/100)) ~ abc2$x1 + abc2$x2) #Fit the model
boot.cor[i]=cor(abc2$y,model$fit)  #Get correlation
}
return (boot.cor)}

bt.cor <- npboot.function(abc,nboot=120)
bootmean <- mean(bt.cor)




Garth.Warren at csiro.au wrote:
> Hi,
> 
>  
> 
> I'm new to R (and statistics) and my boss has thrown me in the deep-end with the following task: 
> 
>  
> 
> We want to evaluate the impact that sampling size has on our ability to create a robust model, or evaluate how robust the model is to sample size for the purpose of cross-validation i.e. in our current project we have collected a series of independent data at 250 locations, from which we have built a predictive model, we want to know whether we could get away with collecting fewer samples and still build a decent model; for the obvious operational reasons of cost, time spent in the field etc.. 
> 
>  
> 
> Our thinking was that we could apply a bootstrap type procedure:
> 
>  
> 
> We would remove 10 records or samples from the total n=250 and then replace those 10 removed with replacements (or copies) from the remaining 240. With this new data-frame we would apply our model and calculate an r², we would then repeat through looping 1000 times before generating the mean r² from those 1000 r² values generated. After which we would start the process again by remove 20 samples from our data with replacements from the remaining 230 records and so on... 
> 
>  
> 
> Below is a simplified version of the real code which contains most of the basic elements. My main problem is I'm not sure what the 'for(i in 1:nboot)' line is doing, originally I though what this meant was that it removed 1 sample or record from the data which was replaced by a copy of one of the records from the remaining n, such that 'for(i in 10:nboot)' when used in the context of the below code removed 10 samples with replacements as I have said above. I'm almost positive that this isn't happening and if not how can I make the code below for example do what we want it to? 
> 
>  
> 
> library(utils)
> 
> #data
> 
> a <- c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1)
> 
> b <- c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7)
> 
> c <- c(5.0,14.6, 8.9, 9.0, 9.1, 5.5)
> 
> #join
> 
> abc <- data.frame(a,b,c)
> 
> #set column names
> 
> names(abc)[1]<-"y"
> 
> names(abc)[2]<-"x1"
> 
> names(abc)[3]<-"x2"
> 
> abc2 <- abc
> 
> #sample
> 
> abc3 <- as.data.frame(t(as.matrix(data.frame(abc2))))
> 
> n <- length(abc2)
> 
> npboot.function <- function(nboot)
> 
> {
> 
> boot.cor <- vector(length=nboot)
> 
> for(i in 1:nboot){
> 
> rdata <- sample(abc3,n,replace=T)
> 
> abc4 <- as.data.frame(t(as.matrix(data.frame(rdata))))
> 
> model <- lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2)
> 
> boot.cor[i] <- cor(abc4$y, model$fit)}
> 
> boot.cor
> 
> }
> 
> bt.cor <- npboot.function(nboot=10)
> 
> bootmean <- mean(bt.cor)
> 
>  
> 
>  
> 
> Any assistance would be greatly appreciated, also the sooner the better as we are under pressure to reach a conclusion.
> 
>  
> 
> Cheers,
> 
>  
> 
> Garth
> 
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list