[R] bootstrap resampling question

Giovanni Petris gpetris at uark.edu
Wed Mar 2 15:22:18 CET 2011


Good point. I'll take my suggestion back...

Giovanni

On Tue, 2011-03-01 at 13:18 -0500, Jonathan P Daily wrote:
> I'm not sure that is equivalent to sampling with replacement, since if the 
> first "draw" is 1, then the probability that the next draw will be one is 
> 4/100 instead of the 1/20 it would be in sampling with replacement. I 
> think the way to do this would be what Greg suggested - something like:
> 
> bigsamp <- sample(1:20, 100, T)
> idx <- sort(unlist(sapply(1:20, function(x) which(bigsamp == 
> x)[1:5])))[1:20]
> samp <- bigsamp[idx]
> 
> --------------------------------------
> Jonathan P. Daily
> Technician - USGS Leetown Science Center
> 11649 Leetown Road
> Kearneysville WV, 25430
> (304) 724-4480
> "Is the room still a room when its empty? Does the room,
>  the thing itself have purpose? Or do we, what's the word... imbue it."
>      - Jubal Early, Firefly
> 
> r-help-bounces at r-project.org wrote on 03/01/2011 09:37:31 AM:
> 
> > [image removed] 
> > 
> > Re: [R] bootstrap resampling question
> > 
> > Giovanni Petris 
> > 
> > to:
> > 
> > Bodnar Laszlo EB_HU
> > 
> > 03/01/2011 11:58 AM
> > 
> > Sent by:
> > 
> > r-help-bounces at r-project.org
> > 
> > Cc:
> > 
> > "'r-help at r-project.org'"
> > 
> > A simple way of sampling with replacement from 1:20, with the additional
> > constraint that each number can be selected at most five times is
> > 
> > > sample(rep(1:20, 5), 20)
> > 
> > HTH,
> > Giovanni
> > 
> > On Tue, 2011-03-01 at 11:30 +0100, Bodnar Laszlo EB_HU wrote:
> > > Hello there,
> > > 
> > > I have a problem concerning bootstrapping in R - especially 
> > focusing on the resampling part of it. I try to sum it up in a 
> > simplified way so that I would not confuse anybody.
> > > 
> > > I have a small database consisting of 20 observations (basically 
> > numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> > > 
> > > I would like to resample this database many times for the 
> > bootstrap process with the following two conditions. The resampled 
> > databases should also have 20 observations and you can select each 
> > of the previously mentioned 20 numbers with replacement. I guess it 
> > is obvious so far. Now the more difficult second condition is that 
> > one number can be selected only maximum 5 times. In order to make 
> > this clear I try to show you an example. So there can be resampled 
> > databases like the following ones:
> > > 
> > > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> > > (4 different numbers are chosen, each selected 5 times)
> > > 
> > > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> > > (Two numbers - 8 and 6 - selected 5 times, number "1" selected 
> > four times, the others selected less than 4 times)
> > > 
> > > My very first guess that came to my mind whilst thinking about the
> > problem was the sample function where there are settings like 
> > replace=TRUE and prob=... where you can create a probability vector 
> > i.e. how much should be the probability of selecting a number. So I 
> > tried to calculate probabilities first. I thought the problem can 
> > basically described as a k-combination with repetitions. 
> > Unfortunately the only thing I could calculate so far is the total 
> > number of all possible selections which amounts to 137 846 527 049.
> > > 
> > > Anybody knows how to implement my second "tricky" condition into 
> > one of the R functions? Are 'boot' and 'bootstrap' packages capable 
> > of managing this? I guess they are, I just couldn't figure it out yet...
> > > 
> > > Thanks very much! Best regards,
> > > Laszlo Bodnar
> > > 
> > > 
> > 
> ____________________________________________________________________________________________________
> > > Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos 
> > és/vagy jogilag, szakmailag vagy más módon védett információt 
> > tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél 
> > tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton 
> > történő terjesztése, felhasználása szigorúan tilos. Amennyiben 
> > tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az 
> > üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal 
> > felelősséget az információ teljes és pontos - címzett(ek)hez történő
> > - eljuttatásáért, valamint semmilyen késésért, kapcsolat 
> > megszakadásból eredő hibáért, vagy az információ felhasználásából 
> > vagy annak megbízhatatlanságából eredő kárért.
> > > 
> > > Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
> > hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet
> > az EBH folytonos munkamenetének biztosítása érdekében.
> > > 
> > > 
> > > This e-mail and any attached files are confidential 
> and/...{{dropped:19}}
> > > 
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > -- 
> > 
> > Giovanni Petris  <GPetris at uark.edu>
> > Associate Professor
> > Department of Mathematical Sciences
> > University of Arkansas - Fayetteville, AR 72701
> > Ph: (479) 575-6324, 575-8630 (fax)
> > http://definetti.uark.edu/~gpetris/
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list