[R] how to randomly select the samples with different probabilities for different classes?

Jim Lemon drjimlemon at gmail.com
Wed Dec 7 22:11:15 CET 2016


Hi Marna,
If we assume a sample size of 1, something like this:

dat[sample(which(dat$group!="C"),ceiling(14*0.4),TRUE),]
dat[sample(which(dat$group=="C"),floor(14*0.6),TRUE),]

Then just step through the two subsets to access your samples.

One problem is that you will not get exactly 40 or 60 %, which is why
I had to put the "ceiling " and "floor" functions to work. Also, you
will have to sample with replacement as you will exhaust the "C"
group.

Jim


On Wed, Dec 7, 2016 at 10:58 PM, Marna Wagley <marna.wagley at gmail.com> wrote:
> Hi R user,
> I have samples with covariates for different classes, I wanted to choose
> the samples of different groups with different probabilities. For example,
> I have a 22 samples size with 3 classes,
> groupA has 8 samples
> groupB has 8 samples
> groupC has 6 samples
>
> I want to select a total 14 samples from 22 samples, in which  40% of the
> 14 samples should be in groups A and B, 60% of the 14 samples should be in
> the group C.
>
> Would you mind to help me on how I can select the samples with that
> conditions? I have attached a sample data
>
> dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L,
> 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L,
> 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
> "B", "C"), class = "factor")), .Names = c("sampleID", "group"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> thanks,
>   MW
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list