[R] bootstrap query

Silvia Kirkman silviakirkman at yahoo.com
Fri Nov 5 10:29:00 CET 2004


Hi

I need to bootstrap a function in R and I am
struggling. Can anyone help? The following explains
what IÂ’m trying to do:

I have 2 different matrices, called "x" and "y". Each
has 34 columns, and the length of each column varies. 

I use this data to determine a certain measure (C),
which IÂ’ve calculated in R as follow:

> schoener<-function(x,y,z)
+ {
+ 
+ # x - seals
+ # y - fishery
+ # z - column of matrix
+ 
+  breaks<-c(0:66)/2
+  hseal<-hist(na.omit(x[,z]), breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
+  hfish<-hist(na.omit(y[,z]), breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
+  lseal<-length(na.omit(x[,z]))
+  lfish<-length(na.omit(y[,z]))
+  pseal<-(hseal$counts)/lseal
+  pfish<-(hfish$counts)/lfish
+  C<-(1-sum(abs(pseal-pfish))/2)*100
+  C
+ }

IÂ’ve also managed to resample (with replacement) the
data in each column of x and y as follows, to give me
new C values:

>resample<-function(x,y,z)
{
# x - seals
# y - fishery
# z - column of matrix

lseal<-length(na.omit(x[,z]))
lfish<-length(na.omit(y[,z]))
resampleseal<-sample(na.omit(x[,z]), lseal, replace =
TRUE)
resamplefish<-sample(na.omit(y[,z]), lfish, replace =
TRUE)

breaks<-c(0:66)/2
hseal<-hist(resampleseal, breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
hfish<-hist(resamplefish, breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
pseal<-(hseal$counts)/lseal
pfish<-(hfish$counts)/lfish
(1-sum(abs(pseal-pfish))/2)*100
}

What I want to be able to do is to obtain 10 000 C
values so that I can get the 95% confidence limits. In
other words, resample 10 000 times. I have tried to
use the "boot" function in R, but I just canÂ’t get it
right: 

boot(data, statistic, R, sim="ordinary", stype="i", 
     strata=rep(1,n), L=NULL, m=0, weights=NULL, 
     ran.gen=function(d, p) d, mle=NULL, ...)

According to above, "statistic=resample" (as IÂ’ve
defined above), "R=10000", and "data" would be x and
y. IÂ’m obviously not understanding something,
especially how to refer to x and y for "data". IÂ’m
sure it must be quite simple what I want to do - I
wonder if anyone out there can explain it to me.

Many thanks.

Silvia




More information about the R-help mailing list