[R] Lack of independence in anova()

Douglas Bates dmbates at gmail.com
Mon Jul 4 18:24:26 CEST 2005

```A couple more comments on this simulation.  Notice that the sim2
function is defined with arguments size, N and p but is being called
with four arguments.  It appears as if the value of p will be 10 in
that call.

If you decide to do such a simulation yourself you can save a lot of
time by just building the model matrix once and using lm.fit in
subsequent calls.

Also, there is no need to do the counting in the body of the sim2
function.  Just save the 3 p-values from each replication.  The test
of independence is equivalent to showing that the distribution of the
p-values is uniform over the unit cube.

On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
>  If the observations are normally distributed and the 2xk design is
> balanced,  theory requires that the tests for interaction and row effects be
> independent.  In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly.  They
> aren't.
>
> sim2=function(size,N,p){
>   cntR=0
>   cntC=0
>   cntI=0
>   cntT=0
>   cntP=0
>   for(i in 1:N){
>     #generate data
>      v=gendata(size)
>     #analyze after build(ing) design containing data
>      lm.out=lm(yield~c*r,build(size,v))
>      av.out=anova(lm.out)
>     #if column effect is significant, increment cntC
>      if (av.out[[5]][1]<=p)cntC=cntC+1
>    #if row effect is significant, increment cntR
>      if (av.out[[5]][2]<=p){
>            cntR=cntR+1
>            tmp = 1
>            }
>      else tmp =0
>      if (av.out[[5]][3]<=p){
>            #if interaction is significant, increment cntI
>             cntI=cntI+1
>         #if both interaction and row effect are significant, increment cntT
>             cntT=cntT + tmp
>             }
>      }
>     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
>
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
>
> gendata=function(size){
>   ssize=sum(size);
>   return (rnorm(ssize))
> }
>
> #Example
>  size=c(3,3,3,0,3,3,3,0)
>  sim2(size,10000,10,.16)
>
>
>
> Phillip Good
> Huntington Beach CA
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help