[R] Another question about bootstrapping cor

Conklin, Mike (GfK) Mike.Conklin at gfk.com
Thu Mar 6 20:39:10 CET 2014


Read the help on boot.

Specifically in non-parametric bootstrapping the statistic function takes a data (the original data) and an index number that shows which rows are taken in the bootstrap sample.  So you need to do the following.

x <- 1:15
y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12) 
x[3] <- NA; x[11] <- NA; x[8] <- NA; y[2] <- NA; y[8] <- NA; y[12] <- NA
cor(x,y,use="complete.obs",method="kendall")
library(boot)
tmpdf <- data.frame(x,y)
cor1 <- function(data,idx) {
  df<-data[idx,]
  cor(df$x,df$y,use="complete.obs",method="kendall")}

corboot <- boot(tmpdf,cor1,R=999)

Bootstrap Statistics :
     original       bias    std. error
t1* 0.8666667 -0.001852283   0.1099832

in your previous function the index was also getting passed to the cor1 function .

To see the individual results use:

corboot$t

t is a matrix of the bootstrap replicates.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Parkhurst
Sent: Thursday, March 06, 2014 12:28 PM
To: r-help at r-project.org
Subject: [R] Another question about bootstrapping cor

When I use the fix that Arun K. provided to my earlier example, I wonder how to find out where in the 999 bootstrap repetitions the value for the actual data fits.

Here is the fixed code:
x <- 1:15
y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12) x[3] <- NA; x[11] <- NA; x[8] <- NA y[2] <- NA; y[8] <- NA; y[12] <- NA
cor(x,y,use="complete.obs",method="kendall")
library(boot)
tmpdf <- data.frame(x,y)
##  corboot <- boot(tmpdf,
cor(x,y,use="complete.obs",method="kendall"),R=999)
cor1 <- function(x,y) {cor(x,y,use="complete.obs",method="kendall")}
corboot <- boot(tmpdf,cor1,R=999)

When I run that, I get
 > corboot

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = tmpdf, statistic = cor1, R = 999)


Bootstrap Statistics :
      original     bias    std. error
t1* 1.0000000 -1.0062026   0.2490860
t2* 0.8666667 -0.8694414   0.2491383

t2* is the estimate for the actual data.  What is t1*?  And again, how do I find where in the distribution of the 999 reps does that true value fall?
I would expect print(corboot) to give me more information, but it just repeats that same output.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list