[R] Off topic --- underdispersed (pseudo) binomial data.
Abby Spurdle
@purd|e@@ @end|ng |rom gm@||@com
Sat Mar 27 10:00:55 CET 2021
Sorry.
I just realized, after posting, that the "n" value in the dispersion
calculation isn't correct.
I'll have to revisit the simulation, tomorrow.
On Sat, Mar 27, 2021 at 9:11 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
>
> Hi Rolf,
>
> Let's say we have a course called Corgiology 101, with a single moderated exam.
> And let's say the moderators transform initial exam scores, such that
> there are fixed percentages of pass rates and A grades.
>
> Rather than count the number of passes, we can count the number of "jumps".
> That is, the number of people that pass the corgiology exam after
> moderation, that would not have passed without moderation.
>
> I've created a function to test for underdispersion, based on your expression.
> (I hope I got it right).
>
> Then I've gone on to create simulations, using both constant and
> nonconstant class sizes.
> The nonconstant simulations apply an (approx) discrete scaling
> transformation, referred to previously.
>
> We can see from the examples that there are a lot of these jumps.
> And more importantly, they appear to be underdispersed.
>
> ----code----
> PASS.SCORE <- 0.5
> A.SCORE <- 0.8
>
> #target parameters
> PASS.RATE <- 0.8
> A.RATE <- 0.2
> #unmoderated parameters
> UNMOD.MEAN.SCORE <- 0.65
> UNMOD.SD.SCORE <- 0.075
>
> NCLASSES <- 2000
> NSTUD.CONST <- 200
> NSTUD.NONCONST.LIMS <- c (50, 800)
>
> sim.njump <- function (nstud, mean0=UNMOD.MEAN.SCORE, sd0=UNMOD.SD.SCORE,
> pass.score=PASS.SCORE, a.score=A.SCORE,
> pass.rate=PASS.RATE, a.rate=A.RATE)
> { x <- rnorm (nstud, mean0, sd0)
> q <- quantile (x, 1 - c (pass.rate, a.rate), names=FALSE)
> dq <- diff (q)
> q <- (a.score - pass.score) / dq * q
> y <- pass.score - q [1] + (a.score - pass.score) / dq * x
> sum (x < a.score & y >= a.score)
> }
>
> sim.nclasses <- function (nclasses, nstud, nstud.std)
> { nstud <- rep_len (nstud, nclasses)
> njump <- integer (nclasses)
> for (i in 1:nclasses)
> njump [i] <- sim.njump (nstud [i])
> if (missing (nstud.std) )
> njump
> else
> round (nstud.std / nstud * njump)
> }
>
> is.under <- function (x, n)
> var (x) < mean (x) * (1 - mean (x) / n)
>
> njump.hom <- sim.nclasses (NCLASSES, NSTUD.CONST)
> nstud <- round (runif (NCLASSES, NSTUD.NONCONST.LIMS [1],
> NSTUD.NONCONST.LIMS [2]) )
> njump.het <- sim.nclasses (NCLASSES, nstud, NSTUD.CONST)
>
> under.hom <- is.under (njump.hom, NSTUD.CONST)
> under.het <- is.under (njump.het, NSTUD.CONST)
> main.hom <- paste0 ("const class size (under=", under.hom, ")")
> main.het <- paste0 ("diff class sizes (under=", under.het, ")")
>
> p0 <- par (mfrow = c (2, 1) )
> hist (njump.hom, main=main.hom)
> hist (njump.het, main=main.het)
> par (p0)
> ----code----
>
> best,
> B.
>
>
> On Thu, Mar 25, 2021 at 2:33 PM Rolf Turner <r.turner using auckland.ac.nz> wrote:
> >
> >
> > I would like a real-life example of a data set which one might think to
> > model by a binomial distribution, but which is substantially
> > underdispersed. I.e. a sample X = {X_1, X_2, ..., X_N} where each X_i
> > is an integer between 0 and n (n known a priori) such that var(X) <<
> > mean(X)*(1 - mean(X)/n).
> >
> > Does anyone know of any such examples? Do any exist? I've done
> > a perfunctory web search, and had a look at "A Handbook of Small
> > Data Sets" by Hand, Daly, Lunn, et al., and drawn a blank.
> >
> > I've seen on the web some references to underdispersed "pseudo-Poisson"
> > data, but not to underdispersed "pseudo-binomial" data. And of course
> > there's lots of *over* dispersed stuff. But that's not what I want.
> >
> > I can *simulate* data sets of the sor that I am looking for (so far the
> > only ideas I've had for doing this are pretty simplistic and
> > artificial) but I'd like to get my hands on a *real* example, if
> > possible.
> >
> > Grateful for any pointers/suggestions.
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Honorary Research Fellow
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list