[R] Questions about generating samples in R

Charles C. Berry cberry at tajo.ucsd.edu
Tue Nov 28 02:26:11 CET 2006


On Mon, 27 Nov 2006, Mark Na wrote:

> Further to Alexander's question ... could anyone provide assistance
> with random stratified sampling? Let's say we have Alex's dataframe
> and we want to stratify the random selection by group membership
> (which is contained in one of the eight columns).
>
> We might want to randomly select:
>
> 1) a constant number (e.g., 5) of rows from each group, or
> 2) a percentage (e.g. 10%) of rows from each group resulting in groups
> being represented proportionally in the sample (with respect to the
> population).
>
> I am aware of stratsrs but this function does not seem to allow the
> second of the above two options.
>
> Any ideas how to achieve this in R?


Suppose 'grp.numbers' holds the group identitities.

Define wrappers for sample():

 	sample.just.5 <- function(x) sample(x ,size = 5 )

 	sample.10.pct <- function(x) sample(x,size=round(0.10*length(x)))

Then use tapply:

 	samples.of.5 <- tapply(seq(along=grp.numbers),grp.numbers, sample.just.5 )

Check this with:

 	table( grp.numbers[ unlist( samples.of.5 ) ] )

Again use tapply:

 	samples.of.10.pct <- tapply(seq(along=grp.numbers),grp.numbers, sample.10.pct )

Check this with:

 	table( grp.numbers[ unlist( samples.of.10.pct ) ] )


There are loads of variations ...

>
> Thanks, Mark
>
>
>
> On 11/26/06, Alexander Geisler <alexander.geisler at gmail.com> wrote:
>> Hello!
>>
>> I have a data set with 8 columns and in about 5000 rows. What I want to
>> do is to generate samples of this data set.
>>
>> Samples of a special size, as example 200.
>>
>> What is the easiest way to do this? No special things are needed, only
>> the random selection of 200 rows of the data set.
>>
>> Thanks
>> Alex
>>
>> --
>> Alexander Geisler * Kaltenbach 151 * A-6272 Kaltenbach
>> email: alexander.geisler at gmx.at | alexander.geisler at gmail.com
>> phone: +43 650 / 811 61 90 | skpye: al1405ex
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the R-help mailing list