[R] Splitting a data column randomly into 3 groups

AbouEl-Makarim Aboueissa @boue|m@k@r|m1962 @end|ng |rom gm@||@com
Mon Sep 6 12:16:08 CEST 2021


Hi Bert and All: good morning

I promise this would be the last time to write about this topic.

I come up with this R function (please see below), for sure with your help.
It works for all sample sizes. I also provided three different simple
examples.

with many thanks
abou

##################    Here it is    ###############

Random.Sample.IDs <- function (N,n, ngroups){    #### N = population size,
and n = sample size, ngroups = number of groups

population.IDs <- seq(1, N, by = 1)
sample.IDs <- sample(population.IDs,n)

##### to print sample.IDs in a column format
##### --------------------------------------------------
sample.IDs.in.column<-data.frame(sample.IDs)
print(sample.IDs.in.column)

reminder.n<-n%%ngroups
reminder.n

n.final<-n-reminder.n
n.final

  m <- n %/% 3
  m
  s <- sample(1:n, n)

if (reminder.n == 0) {

  group1.IDs <- sample.IDs[s[1:m]]
  group2.IDs <- sample.IDs[s[(m+1):(2*m)]]
  group3.IDs <- sample.IDs[s[(m*2+1):(3*m)]]

} else if(reminder.n == 1){

  group1.IDs <- sample.IDs[s[1:(m+1)]]
  group2.IDs <- sample.IDs[s[(m+2):(2*m+1)]]
  group3.IDs <- sample.IDs[s[(m*2+2):(3*m+1)]]

} else if(reminder.n == 2){

  group1.IDs <- sample.IDs[s[1:(m+1)]]
  group2.IDs <- sample.IDs[s[(m+2):(2*m+2)]]
  group3.IDs <- sample.IDs[s[(m*2+3):(3*m+2)]]
}
nn<-max(length(group1.IDs),length(group2.IDs),length(group3.IDs))
nn
length(group1.IDs) <- nn
length(group2.IDs) <- nn
length(group3.IDs) <- nn

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs

}


#####  Examples
#####  --------

Random.Sample.IDs (100,12,3)    #### group sizes are equal (n1=n2=n3=4)

Random.Sample.IDs (100,13,3)    #### group sizes are NOT equal (n1=5, n2=4,
n3=4)

Random.Sample.IDs (100,17,3)    #### group sizes are NOT equal (n1=6, n2=6,
n3=5)


______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Sun, Sep 5, 2021 at 6:50 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> In case anyone is still interested in my query, note that if there are
> n total items to be split into g groups as evenly as possible, if we
> define this as at most two different size groups whose size differs by
> 1, then:
>
> if n = k*g + r, where 0 <= r < g,
> then n = k*(g - r) + (k + 1)*r  .
> i.e. g-r groups of size k and r groups of size k+1
>
> So using R's modular arithmetic operators, which are handy to know
> about, we have:
>
> r = n %% g and k = n %/% g .
>
> (and note that you should disregard my previous stupid remark about
> numerical analysis).
>
> Cheers,
> Bert
>
>
> On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > I have a more general problem for you.
> >
> > Given n items and 2 <=g <<n , how do you divide the n items into g
> > groups that are as "equal as possible."
> >
> > First, operationally define "as equal as possible."
> > Second, define the algorithm to carry out the definition. Hint: Note
> > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of
> > items in the ith group.
> > Third, write R code for the algorithm. Exercise for the reader.
> >
> > I may be wrong, but I think numerical analysts might also have a
> > little fun here.
> >
> > Randomization, of course, is trivial.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa
> > <abouelmakarim1962 using gmail.com> wrote:
> > >
> > > Dear Thomas:
> > >
> > >
> > > Thank you very much for your input in this matter.
> > >
> > >
> > > The core part of this R code(s) (please see below) was written by
> *Richard
> > > O'Keefe*. I had three examples with different sample sizes.
> > >
> > >
> > >
> > > *First sample of size n1 = 204* divided randomly into three groups of
> sizes
> > > 68. *No problems with this one*.
> > >
> > >
> > >
> > > *The second sample of size n2 = 112* divided randomly into three
> groups of
> > > sizes 37, 37, and 38. BUT this R code generated three groups of equal
> sizes
> > > (37, 37, and 37). *How to fix the code to make sure that the output
> will be
> > > three groups of sizes 37, 37, and 38*.
> > >
> > >
> > >
> > > *The third sample of size n3 = 284* divided randomly into three groups
> of
> > > sizes 94, 95, and 95. BUT this R code generated three groups of equal
> sizes
> > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
> > > output will be three groups of sizes 94, 95, and 95*.
> > >
> > >
> > > With many thanks
> > >
> > > abou
> > >
> > >
> > > ###########  ------------------------   #############
> > >
> > >
> > > N1 <- 485
> > > population1.IDs <- seq(1, N1, by = 1)
> > > #### population1.IDs
> > >
> > > n1<-204                                        ##### in this case the
> size
> > > of each group of the three groups = 68
> > > sample1.IDs <- sample(population1.IDs,n1)
> > > #### sample1.IDs
> > >
> > > ####  n1 <- length(sample1.IDs)
> > >
> > >   m1 <- n1 %/% 3
> > >   s1 <- sample(1:n1, n1)
> > >   group1.IDs <- sample1.IDs[s1[1:m1]]
> > >   group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
> > >   group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]
> > >
> > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> > >
> > > groups.IDs
> > >
> > >
> > > ####### --------------------------
> > >
> > >
> > > N2 <- 266
> > > population2.IDs <- seq(1, N2, by = 1)
> > > #### population2.IDs
> > >
> > > n2<-112                           ##### in this case the sizes of the
> three
> > > groups are(37, 37, and 38)
> > >                                           ##### BUT this codes generate
> > > three groups of equal sizes (37, 37, and 37)
> > > sample2.IDs <- sample(population2.IDs,n2)
> > > #### sample2.IDs
> > >
> > > ####  n2 <- length(sample2.IDs)
> > >
> > >   m2 <- n2 %/% 3
> > >   s2 <- sample(1:n2, n2)
> > >   group1.IDs <- sample2.IDs[s2[1:m2]]
> > >   group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
> > >   group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]
> > >
> > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> > >
> > > groups.IDs
> > >
> > >
> > > ####### --------------------------
> > >
> > >
> > >
> > > N3 <- 674
> > > population3.IDs <- seq(1, N3, by = 1)
> > > #### population3.IDs
> > >
> > > n3<-284                           ##### in this case the sizes of the
> three
> > > groups are(94, 95, and 95)
> > >                                           ##### BUT this codes generate
> > > three groups of equal sizes (94, 94, and 94)
> > > sample2.IDs <- sample(population2.IDs,n2)
> > > sample3.IDs <- sample(population3.IDs,n3)
> > > #### sample3.IDs
> > >
> > > ####  n3 <- length(sample2.IDs)
> > >
> > >   m3 <- n3 %/% 3
> > >   s3 <- sample(1:n3, n3)
> > >   group1.IDs <- sample3.IDs[s3[1:m3]]
> > >   group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
> > >   group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]
> > >
> > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> > >
> > > groups.IDs
> > >
> > > ______________________
> > >
> > >
> > > *AbouEl-Makarim Aboueissa, PhD*
> > >
> > > *Professor, Statistics and Data Science*
> > > *Graduate Coordinator*
> > >
> > > *Department of Mathematics and Statistics*
> > > *University of Southern Maine*
> > >
> > >
> > >
> > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs77m using yahoo.com> wrote:
> > >
> > > > Abou,
> > > >
> > > >
> > > >
> > > > I’ve been following your question on how to split a data column
> randomly
> > > > into 3 groups using R.
> > > >
> > > >
> > > >
> > > > My method may not be amenable for a large set of data but it surely
> worth
> > > > considering since it makes sense intuitively.
> > > >
> > > >
> > > >
> > > > mydata <- LETTERS[1:11]
> > > >
> > > > > mydata
> > > >
> > > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
> > > >
> > > >
> > > >
> > > > # Let’s choose a random sample of size 4 from mydata
> > > >
> > > > > random_grp1
> > > >
> > > > [1] "J" "H" "D" "A"
> > > >
> > > >
> > > >
> > > > Now my next random selection of data is defined by
> > > >
> > > > data_wo_random <- setdiff(mydata,random_grp1)
> > > >
> > > > # this makes sense because I need to choose random data from a set
> which
> > > > is defined by the difference of the sets mydata and random_grp1
> > > >
> > > >
> > > >
> > > > > data_wo_random
> > > >
> > > > [1] "B" "C" "E" "F" "G" "I" "K"
> > > >
> > > >
> > > >
> > > > This is great! So now I can randomly select data of any size from
> this set.
> > > >
> > > > Repeating this process can easily generate subgroups of your original
> > > > dataset of any size you want.
> > > >
> > > >
> > > >
> > > > Surely this method could be improved so that this could be done
> > > > automatically.
> > > >
> > > > Nevertheless, this is an intuitive method which I believe is easier
> to
> > > > understand than some of the other methods posted.
> > > >
> > > >
> > > >
> > > > Hope this helps!
> > > >
> > > >
> > > >
> > > > Thomas Subia
> > > >
> > > > Statistician
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list