[R] Splitting a data column randomly into 3 groups

AbouEl-Makarim Aboueissa @boue|m@k@r|m1962 @end|ng |rom gm@||@com
Sat Sep 4 23:12:43 CEST 2021


Dear Thomas:


Thank you very much for your input in this matter.


The core part of this R code(s) (please see below) was written by *Richard
O'Keefe*. I had three examples with different sample sizes.



*First sample of size n1 = 204* divided randomly into three groups of sizes
68. *No problems with this one*.



*The second sample of size n2 = 112* divided randomly into three groups of
sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes
(37, 37, and 37). *How to fix the code to make sure that the output will be
three groups of sizes 37, 37, and 38*.



*The third sample of size n3 = 284* divided randomly into three groups of
sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes
(94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
output will be three groups of sizes 94, 95, and 95*.


With many thanks

abou


###########  ------------------------   #############


N1 <- 485
population1.IDs <- seq(1, N1, by = 1)
#### population1.IDs

n1<-204                                        ##### in this case the size
of each group of the three groups = 68
sample1.IDs <- sample(population1.IDs,n1)
#### sample1.IDs

####  n1 <- length(sample1.IDs)

  m1 <- n1 %/% 3
  s1 <- sample(1:n1, n1)
  group1.IDs <- sample1.IDs[s1[1:m1]]
  group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
  group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


####### --------------------------


N2 <- 266
population2.IDs <- seq(1, N2, by = 1)
#### population2.IDs

n2<-112                           ##### in this case the sizes of the three
groups are(37, 37, and 38)
                                          ##### BUT this codes generate
three groups of equal sizes (37, 37, and 37)
sample2.IDs <- sample(population2.IDs,n2)
#### sample2.IDs

####  n2 <- length(sample2.IDs)

  m2 <- n2 %/% 3
  s2 <- sample(1:n2, n2)
  group1.IDs <- sample2.IDs[s2[1:m2]]
  group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
  group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


####### --------------------------



N3 <- 674
population3.IDs <- seq(1, N3, by = 1)
#### population3.IDs

n3<-284                           ##### in this case the sizes of the three
groups are(94, 95, and 95)
                                          ##### BUT this codes generate
three groups of equal sizes (94, 94, and 94)
sample2.IDs <- sample(population2.IDs,n2)
sample3.IDs <- sample(population3.IDs,n3)
#### sample3.IDs

####  n3 <- length(sample2.IDs)

  m3 <- n3 %/% 3
  s3 <- sample(1:n3, n3)
  group1.IDs <- sample3.IDs[s3[1:m3]]
  group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
  group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs

______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs77m using yahoo.com> wrote:

> Abou,
>
>
>
> I’ve been following your question on how to split a data column randomly
> into 3 groups using R.
>
>
>
> My method may not be amenable for a large set of data but it surely worth
> considering since it makes sense intuitively.
>
>
>
> mydata <- LETTERS[1:11]
>
> > mydata
>
> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
>
>
>
> # Let’s choose a random sample of size 4 from mydata
>
> > random_grp1
>
> [1] "J" "H" "D" "A"
>
>
>
> Now my next random selection of data is defined by
>
> data_wo_random <- setdiff(mydata,random_grp1)
>
> # this makes sense because I need to choose random data from a set which
> is defined by the difference of the sets mydata and random_grp1
>
>
>
> > data_wo_random
>
> [1] "B" "C" "E" "F" "G" "I" "K"
>
>
>
> This is great! So now I can randomly select data of any size from this set.
>
> Repeating this process can easily generate subgroups of your original
> dataset of any size you want.
>
>
>
> Surely this method could be improved so that this could be done
> automatically.
>
> Nevertheless, this is an intuitive method which I believe is easier to
> understand than some of the other methods posted.
>
>
>
> Hope this helps!
>
>
>
> Thomas Subia
>
> Statistician
>
>
>
>
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list