[R] Interpreting Results of Bootstrapping

Sun Jul 11 15:55:50 CEST 2004

You are right, the outlier caused the problem. Using
Spearman or Kendall's correlation seems to solve the
problem. Thanks!

Y. C. Tao

--- Ted.Harding at nessie.mcc.ac.uk wrote:
> Hi!
> 
> Simply plot(x1,x2): you will see that there is one
> point
> (number 23) at (x1,x2) = (25.34,6.744) which is a
> very
> long way from all the other points (which, among
> themselves,
> form a somewhat diffuse cluster with some suggestion
> of
> further structure).
> 
> When you bootstrap, the correlation you obtain in
> any sample
> will depend on whether or not this outlying point is
> included
> in the sample. If it is included, this single point
> will generate
> a relatively high value of the correlation
> coefficient simply
> because it is such a long way from all the others
> (i.e. it is
> highly influential).
> 
> If it is not included, then the diffuse character of
> the other
> points will generate a very low value of the
> correlation
> coefficient.
> 
>   > cor(x1,x2)
>   [1] 0.7471931
>   > cor(x1[-23],x2[-23])
>   [1] 0.03914653
> 
> Therefore your bootstrap distribution will have two
> peaks: one
> peak, around 0.75, corresponding to the bootstrap
> samples which
> include this outlying point, and the other, around
> 0, corresponding
> to the bootstrap samples which do not include it.
> 
> This is the explanation and, at the same time, the
> interpretation.
> 
> Best wishes,
> Ted.
> 
> On 11-Jul-04 Y C Tao wrote:
> > I tried to bootstrap the correlation between two
> > variables x1 and x2. The resulting distribution
> has
> > two distinct peaks, how should I interprete it?
> > 
> > The original code is attached.
> > 
> > Y. C. Tao
> > 
> > ----------------
> > 
> > library(boot);
> >  
> > my.correl<-function(d, i) cor(d[i,1], d[i,2])
> >  
> >
>
x1<-c(-2.612,-0.7859,-0.5229,-1.246,1.647,1.647,0.1811,
> >      
> -0.07097,0.8711,0.4323,0.1721,2.143,4.33,0.5002,
> >       
> 0.4015,-0.5225,2.538,0.07959,-0.6645,4.521,-1.371,
> >       
> 0.3327,25.24,-0.5417,2.094,0.6064,-0.4476,-0.5891,
> >      
>
-0.08879,-0.9487,-2.459e-05,-0.03887,0.2116,-0.0625,1.555,
> >       
> 0.2069,-0.2142,-0.807,-0.6499,2.384,-0.02063,1.179,
> >      
> -0.0003586,-1.408,0.6928,0.689,0.1854,0.4351,0.5663,
> >        0.07171,-0.07004);
> >  
> > x2<-c(
>
0.08742,0.2555,-0.00337,0.03995,-1.208,-1.208,-0.001374,
> >      
> -1.282,1.341,-0.9069,-0.2011,1.557,0.4517,-0.4376,
> >       
>
0.4747,0.04965,-0.1668,-0.6811,-0.7011,-1.457,0.04652,
> >      
> -1.117,6.744,-1.332,0.1327,-0.1479,-2.303,0.1235,   
>   
> >       
>
0.5916,0.05018,-0.7811,0.5869,-0.02608,0.9594,-0.1392,
> >       
> 0.4089,0.1468,-1.507,-0.6882,-0.1781,0.5434,-0.4957,
> >       
>
0.02557,-1.406,-0.5053,-0.7345,-1.314,0.3178,-0.2108,
> >        0.4186,-0.03347);
> >  
> > b<-boot(cbind(x1, x2), my.correl, 2000)
> > hist(b$t, breaks=50)
> 
> [The above rearranged to have 7 values in each
> conplete line]
> 
> 
> 
>
--------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 167 1972
> Date: 11-Jul-04                                     
>  Time: 10:40:34
> ------------------------------ XFMail
> ------------------------------
>