[R] Lack of independence in anova()

Phillip Good pigood at verizon.net
Mon Jul 4 18:58:15 CEST 2005

My bad.  The example of a call should read, sim2(size,10000,.16).
Originally, the intent of the program was to compare ANOV (the gold
standard) with synchronized permutation tests when data is from contaminated
normal distributions or the design is unbalanced.  The tests for main
effects and interactions are always independent with synchronized
permutations and ought to be for ANOV with normal data and balanced designs.

-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 9:24 AM
To: pigood at verizon.net
Cc: r-help at r-project.org
Subject: Re: [R] Lack of independence in anova()

A couple more comments on this simulation.  Notice that the sim2
function is defined with arguments size, N and p but is being called
with four arguments.  It appears as if the value of p will be 10 in
that call.

If you decide to do such a simulation yourself you can save a lot of
time by just building the model matrix once and using lm.fit in
subsequent calls.

Also, there is no need to do the counting in the body of the sim2
function.  Just save the 3 p-values from each replication.  The test
of independence is equivalent to showing that the distribution of the
p-values is uniform over the unit cube.

On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
>  If the observations are normally distributed and the 2xk design is
> balanced,  theory requires that the tests for interaction and row effects
> independent.  In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly.  They
> aren't.
> sim2=function(size,N,p){
>   cntR=0
>   cntC=0
>   cntI=0
>   cntT=0
>   cntP=0
>   for(i in 1:N){
>     #generate data
>      v=gendata(size)
>     #analyze after build(ing) design containing data
>      lm.out=lm(yield~c*r,build(size,v))
>      av.out=anova(lm.out)
>     #if column effect is significant, increment cntC
>      if (av.out[[5]][1]<=p)cntC=cntC+1
>    #if row effect is significant, increment cntR
>      if (av.out[[5]][2]<=p){
>            cntR=cntR+1
>            tmp = 1
>            }
>      else tmp =0
>      if (av.out[[5]][3]<=p){
>            #if interaction is significant, increment cntI
>             cntI=cntI+1
>         #if both interaction and row effect are significant, increment
>             cntT=cntT + tmp
>             }
>      }
>     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
> gendata=function(size){
>   ssize=sum(size);
>   return (rnorm(ssize))
> }
> #Example
>  size=c(3,3,3,0,3,3,3,0)
>  sim2(size,10000,10,.16)
> Phillip Good
> Huntington Beach CA
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!

More information about the R-help mailing list