[R] restricted bootstrap

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Sep 6 13:51:45 CEST 2008


On Thu, 4 Sep 2008, Grant Gillis wrote:

> Hello Professor Ripely,
>
> Sorry for not being clear.  I posted after a long day of struggling.  Also
> my toy distance matrix should have been symmetrical.
>
> Simply put I have spatially autocorrelated data collected from many points.
> I would like to do a linear regression on these data.  To deal with the
> autocrrelation I want to resample a subset of my data with replacement but I
> need to restrict subsets such that no two locations where data was collected
> are closer than Xm apart (further apart than the autocrrelation in the
> data).

That is impossible.  Resampling with replacement will give duplicated
locations (with a very high probability) and those have distance zero.

If you want a subsample (necessarily without replacement) you have a 
hard-core point process on a discrete set.  It's possible that the MCMC 
methods we used for Strauss processes can be made to work in that case, 
but it is also possible that the state space is reducible and so more 
elaborate algorithms are needed.

I do think it would be much easier to take autocorrelation into account in 
your linear model fit.  There are many ways to do that, e.g. MASS::lm.gls, 
and in fact uless the correlations are very high OLS is likely to be quite 
efficient (but you need to use e.g. a sandwich estimator to get reliable 
standard errors).

> Thanks for having a look at this for me.  I will look up the hard-core
> spatial point process.
>
> Grant
>
> 2008/9/4 Prof Brian Ripley <ripley at stats.ox.ac.uk>
>
>> I see nothing here to do with the 'bootstrap', which is sampling with
>> replacement.
>>
>> Do you know what you mean exactly by 'randomly sample'?  In general the way
>> to so this is to sample randomly (uniformly, whatever) and reject samples
>> that do not meet your restriction.   For some restrictions there are more
>> efficient algorithms, but I don't understand yours.  (What are the 'rows'?
>>  Do you want to sample rows in space or xy locations?  How come 'dist' is
>> not symmetric?)  For some restrictions, an MCMC sampling scheme is needed,
>> the hard-core spatial point process being a related example.
>>
>>
>> On Wed, 3 Sep 2008, Grant Gillis wrote:
>>
>>  Hello List,
>>>
>>> I am not sure that I have the correct terminology here (restricted
>>> bootstrap) which may be hampering my archive searches.  I have quite a
>>> large
>>> spatially autocorrelated data set.  I have xy coordinates and the
>>> corresponding pairwise distance matrix (metres) for each row.  I would
>>> like
>>> to randomly sample some number of rows but restricting samples such that
>>> the
>>> distance between them is larger than the threshold of autocorrelation.  I
>>> have been been unsuccessfully trying to link the 'sample' function to
>>> values
>>> in the distance matrix.
>>>
>>> My end goal is to randomly sample M thousand rows of data N thousand times
>>> calculating linear regression coefficients for each sample but am stuck on
>>> taking the initial sample. I believe I can figure out the rest.
>>>
>>>
>>> Example Question
>>>
>>> I would like to radomly sample 3 rows further but withe the restriction
>>> that
>>> they are greater than 100m apart
>>>
>>> example data:
>>> main data:
>>>
>>> y<- c(1, 2, 9, 5, 6)
>>> x<-c( 1, 3, 5, 7, 9)
>>> z<-c(2, 4, 6, 8, 10)
>>> a<-c(3, 9, 6, 4 ,4)
>>>
>>> maindata<-cbind(y, x, z, a)
>>>
>>>    y x x a
>>> [1,] 1 1 1 3
>>> [2,] 2 3 3 9
>>> [3,] 9 5 5 6
>>> [4,] 5 7 7 4
>>> [5,] 6 9 9 4
>>>
>>> distance matrix:
>>> row1<-c(0, 123, 567, 89)
>>> row2<-c(98, 0, 345, 543)
>>> row3<-c(765, 90, 0, 987)
>>> row4<-c(654, 8, 99, 0)
>>>
>>> dist<-rbind(row1, row2, row3, row4)
>>>
>>>    [,1] [,2] [,3] [,4]
>>> row1    0  123  567   89
>>> row2   98    0  345  543
>>> row3  765   90    0  987
>>> row4  654    8   99    0
>>>
>>> Thanks for all of the help in the past and now
>>>
>>> Cheers
>>> Grant
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/>
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list