Boris Steipe
Sun May 19 23:12:36 CEST 2019
My mental model for such a simulation is that you create data from a known distribution, then use your model to check that you can recover the known parameters from the data. Thus how the marks are created depends on what influences them. Here is a toy model to illustrate this - expanding on my code sample:
# a function to generate marks from parameters
rMarks <- function(n, m, s) {
# a normal distribution limited to between 1 and 6, in 0.5 intervals, with
# mean m and standard deviation s
marks <- rnorm(n, m, s)
marks <- round(marks * 2) / 2
marks[marks < 1] <- 1
marks[marks > 6] <- 6
return(marks)
}
# Teachers in two categories: 70% of teachers (tNormal) grade everyone according to
# a marks distribution with m = 3.5 and sd = 1 ; the others grade girls with a
# m = 4.5 and sd = 0.7 and boys with m = 3.0 and sd = 1.2
# define who are the "normal teachers"
x <- paste0("t", 1:(nS * nTpS))
tNormal <- sample(x, round(nS * nTpS * 0.7), replace = FALSE)
# this is rather pedestrian code, but as explicit as I can make it ...
for (i in 1:nrow(mySim)) {
if (mySim$Teacher[i] %in% tNormal) {
m <- 3.5
s <- 1.0
} else {
if (mySim$Gender[i] == "girl") {
m <- 4.5
s <- 0.7
} else {
m <- 3.0
s <- 1.2
}
}
mySim$Mark[i] <- rMarks(1, m, s)
}
# Validate
table(mySim$Mark)
hist(mySim$Mark[mySim$Teacher %in% tNormal],
col = "#0000BB44")
hist(mySim$Mark[ ! mySim$Teacher %in% tNormal],
add = TRUE,
col = "#BB000044")
Then the challenge is to recover the parameters from your analysis.
Cheers,
Boris
>
>
>
>
> Fair enough - there are additional assumptions needed, which I make as follows:
> - each class has the same size
> - each teacher teaches the same number of classes
> - the number of boys and girls is random within a class
> - there are 60% girls (just for illustration that it does not have to be equal)
>
>
> To make the dependencies explicit, I define them so, and in a way that they can't be inconsistent.
>
> nS <- 10 # Schools
> nTpS <- 5 # Teachers per School
> nCpT <- 2 # Classes per teacher
> nPpC <- 20 # Pupils per class
> nS * nTpS * nCpT * nPpC == 2000 # Validate
>
>
> mySim <- data.frame(School = paste0("s", rep(1:nS, each = nTpS*nCpT*nPpC)),
> Teacher = paste0("t", rep(1:(nTpS*nS), each = nCpT*nPpC)),
> Class = paste0("c", rep(1:(nCpT*nTpS*nS), each = nPpC)),
> Gender = sample(c("boy", "girl"),
> (nS*nTpS*nCpT*nPpC),
> prob = c(0.4, 0.6),
> replace = TRUE),
> Mark = numeric(nS*nTpS*nCpT*nPpC),
> stringsAsFactors = FALSE)
>
>
> Then you fill mySim$Mark with values from your linear mixed model ...
>
> mySim$Mark[i] <- simMarks(mySim[i]) # ... or something equivalent.
>
>
> All good?
>
> Cheers,
> Boris
>
>
>
>>>
>>>
>>>
>>>
>>>
>>> Can you build your data top-down?
>>>
>>>
>>>
>>> schools <- paste("s", 1:6, sep="")
>>>
>>> classes <- character()
>>> for (school in schools) {
>>> classes <- c(classes, paste(school, paste("c", 1:5, sep=""), sep = "."))
>>> }
>>>
>>> pupils <- character()
>>> for (class in classes) {
>>> pupils <- c(pupils, paste(class, paste("p", 1:10, sep=""), sep = "."))
>>> }
>>>
>>>
>>>
>>> B.
>>>
>>>
>>>
>>>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help using r-project.org> wrote:
>>>>
>>>> Dear R-Experts,
>>>>
>>>> In a data simulation, I would like a balanced distribution with a nested structure for classroom and teacher (not for school). I mean 50 pupils belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging to C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on. The school don’t have to be nested, I just want a balanced distribution, I mean 60 pupils in S1, 60 other pupils in S2 and so on.
>>>> Here below the reproducible example.
>>>> Many thanks for your help.
>>>>
>>>> ##############
>>>> set.seed(123)
>>>> # Génération aléatoire des colonnes
>>>> pupils<-1:300
>>>> classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T) teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T) school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>>>
>>>> ##############
>>>>
