[R] Odp: Unexpected Behavior (potentially) in t.test

Petr PIKAL petr.pikal at precheza.cz
Mon Jun 23 16:34:29 CEST 2008


Hi

I tried your code with R 2.8.0 devel,  gave up after about 4 hours (more 
or less t1N was 18) but until i interrupted it manually there was no 
problem neither error. Maybe you could try it with new R version. Besides 
I did not debug or profile your code so maybe you could try to debug it 
yourself. Especially 3 nested loops seems to me quite ineffective together 
with incremental Cgroup and Tgroup increasing.

Regards

Petr
petr.pikal at precheza.cz
724008364, 581252140, 581252257


r-help-bounces at r-project.org napsal dne 20.06.2008 20:55:08:

> Greetings,
> 
> I have stumbled across some unexpected behavior (potential a bug) in, 
what I
> suspect to be R's (2.6.2 on Ubuntu Linux) t.test function; then again 
the
> problem may exist in my code. I have shutdown R and started it back up,
> re-run the code and re-experienced the error. I have searched on Google 
for
> the abnormal termination error message "(stderr < 10 * 
.Machine$double.eps *
> max(abs(mx), abs(my))) stop("data are essentially constant")" but only 
found
> one instance, http://tolstoy.newcastle.edu.au/R/e2/help/07/06/18179.html
,
> but the discussion there did not seem particularly helpful.
> 
> I've included all of my code, amateurish though it may be. I have not
> isolated the faulty part, and to me it all looks pretty simple, so I'm 
not
> sure where I'm going wrong. For background, the goal of this code is to 
run
> a simulation to explore the problem space of inflation of Type I error 
when
> decisions to run or not to run more participants are made by preliminary
> looks at the data (as in Wagenmakers, 2007). This code is meant to 
examine
> the problem space given that there is no true difference between the 
groups
> (as is the case when both a generated from random draws from the normal
> distribution). I run an initial number of subjects in two groups (t1N) 
then,
> if p is < .25 on the t-test I add t2N more subjects to each group. Then 
I
> perform the t.test again. If the p was > .25 at time 1 I stop. Plainp is
> simply storing the p-values from t2 (if it was performed) or from t1 (if 
t2
> was not performed). In the code I provide t1 starts at 16 since this is
> about when the problem becomes more frequent. Please note that it takes
> quite a long while to fail, and depending on what the true cause is it 
may
> not fail at all. On my system it is failing before t1N advances to 17.
> 
> Any suggestions as to how to avoid the error and instructions as to the
> cause of it would be appreciated. Thank you for your input and patience.
> 
> logit <- function(p)
> 
> {
> 
> # compute and return logit of p;
> 
> # if p=.5 then logit==0 else sign(logit)==(p>.5)
> 
> return( log(p/(1-p)) )
> 
> }
> 
> 
>  antilogit <- function(x)
> 
> {
> 
> # compute and return antilogit of x;
> 
> # this returns a proportion p for which logit(p)==x;
> 
> return( exp(x)/(1+exp(x)) )
> 
> }
> 
> 
>  plainp <- c() #Clear the plainp value
> 
> t1Nsim <- (100/5) * 1000 * 10 # random chance should provide 10000 cases 
at
> t1
> 
> contthreshold <- .25 #p value below which we run more subjects
> 
> t1pvals <- rep(NA,t1Nsim) #clear the pvalues
> 
> t2pvals <- rep(NA,t1Nsim) #clear the pvalues
> 
> t1N <- 10 #for debugging
> 
> t2N <- 5 #for debugging
> 
> 
>  for (t1N in 16:50) #Outer loop testing possible values for t1N
> 
> for (t2N in 1:50) #Inner loop testing possible values for t2N
> 
> {
> 
> print(paste("Checking with ",t1N," initial samples and ",t2N," extra
> samples",sep="")) #feedback
> 
> for (lcv in 1:t1Nsim) #Run simulation t1Nsim times...
> 
> {
> 
> if (lcv %% 20000 == 0) {print(paste((lcv/t1Nsim)*100,"%",sep=""))} 
#feedback
> 
> Cgroup <- rnorm(t1N) #Initial random draw for Group1
> 
> Tgroup <- rnorm(t1N) #Initial random draw for Group 2
> 
> currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t1 p value
> 
> t1pvals[lcv] <- currentp #Store t1 p value
> 
> #If p >= .05 or <= continue threshold then run more subjects
> 
> if ((currentp <= contthreshold) & (currentp >= .05)) {
> 
> Cgroup <- c(Cgroup,rnorm(t2N)) #Add t2N subjects to group 1
> 
> Tgroup <- c(Tgroup,rnorm(t2N)) #Add t2N subjects to group 2
> 
> currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t2 p value
> 
> t2pvals[lcv] <- currentp #store t2 p value
> 
> }
> 
> }
> 
> plainp <- ifelse(!is.na(t2pvals),{t2pvals},{t1pvals}) #Make sure we are
> looking at the right ps
> 
> table(t1pvals <= .05); round(summary(t1pvals),4) #debugging
> 
> hist(t1pvals) #debugging
> 
> table(plainp <= .05); round(summary(plainp),4) #debugging
> 
> hist(plainp, probability=TRUE,main=paste(t1N,"then",t2N)) #Histogram of
> interest
> 
> abline(a=1.00,b=0) #Baseline probability
> 
> dev.copy(jpeg,filename=paste("Sim with ",t1N, "start samples and ",t2N,"
> extra samples.jpg",sep=""),height=600,width=800,bg="white") # Create the
> image
> 
> dev.off() #Save the image
> 
> chi <- rbind(table(t1pvals <= .05),table(plainp <= .05)) #debugging
> 
> chisq.test(chi) #debugging
> 
> explore <- data.frame(t1=t1pvals,t2=t2pvals,picked=plainp) #debugging
> 
> t.test(explore$picked,explore$t1) #debugging
> 
> t.test(logit(explore$picked),logit(explore$t1)) #debugging
> 
> }
> 
> ---
> Russell Pierce
> Psychology Department
> Graduate Student - Cognitive
> (951) 827-2553
> University of California, Riverside, 92521
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list