[R] Depletion of small p values upon iterative testing of identical normal distributions

Duncan Murdoch murdoch.duncan at gmail.com
Mon Sep 20 16:20:09 CEST 2010


  On 20/09/2010 9:54 AM, A wrote:
> Dear all,
>
> I'm performing a t-test on two normal distributions with identical mean&
> standard deviation, and repeating this tests a very large number of times to
> describe an representative p value distribution in a null case. As a part of
> this, the program bins these values in 10 evenly distributed bins between 0
> and 1 and reports the number of observations in each bin. What I have
> noticed is that even after 500,000 replications the number in my lowest bin
> is consistently ~5% smaller than the number in all the other bins, which are
> similar within about 1% of each other. Is there any reason, perhaps to do
> with random number generation in R or the nature of the normal distribution
> simulated by the rnorm function that could explain this depletion?

No, equal sized bins should expect equal numbers of entries.  But your 
code may have errors in it.

This is a very slow, and slightly dangerous way to program R:
> Here are two key parts of my code to show what functions I'm working with:
>
> #Calculating the p values
> while(i<numtests){
> Group1<-rnorm(6,-0.0065,0.0837)
> Group2<-rnorm(6,-0.0065,0.0837)
> PV<-t.test(Group1,Group2)$p.value
> pscoresvector<-c(PV,pscoresvector)
> i<-i+1
> }

The slowness comes because pscoresvector is growing by one entry every 
iteration.  The danger comes because the initialization of pscoresvector 
is not shown.  If it wasn't initialized, you'll be binning whatever junk 
was there before, as well as the new values.

I'd suggest initializing it as

pscoresvector <- numeric(numtests)

and updating using

pscoresvector[i] <- PV

to avoid both problems.

Duncan Murdoch

> #Binning the results
> freqtbl1<-binning(pscoresvector,breaks=bins)
>
> Thanks in advance for any insights,
>
> Andrew
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list