[R] Reference for aov()

Wed Jul 27 22:34:05 CEST 2016

Hi Peter,

Thank you for your good answer. I am sorry for the late reply.

*An ortogonalized model matrix generates a decomposition of the model space
into orthogonal subspaces corresponding to the terms of the model.
Projections onto each of the subspaces are easily worked out.  E.g., for a
two-way analysis (Y~row+col+row:col) you can decompose the model effect as
a row effect, a column effect, and an interaction effect. This allows
straightforward calculation of the sums of squares of the ANOVA table. As
you are probably well aware, if the design is unbalanced, the results will
depend on the order of terms -- col + row + col:row gives a different
result.*

It may be a stupid question. How are projections of each sums of squares
easily worked out and how does the sums of squares follow easily? Does it
matter that certain parameters of the model are not estimated. R appears to
just give a sums of squares despite some of the parameters being
non-estimable.

Thank you

On 14 July 2016 at 09:50, peter dalgaard <pdalgd at gmail.com> wrote:

> I am not aware of a detailed documentation of this beyond the actual
> source code.
> However, the principles are fairly straightforward, except that the rules
> for constructing the design matrix from the model formula can be a bit
> arcane at times.
>
> The two main tools are the design matrix constructor (model.matrix) and a
> Gram-Schmidt type ortogonalization of its columns (the latter is called a
> QR decomposition in R, which it is, but there are several algorithms for
> QR, and the linear models codes depend on the QR algorithm being based on
> orthogonalization - so LINPACK works and LAPACK doesn't).
>
> An ortogonalized model matrix generates a decomposition of the model space
> into orthogonal subspaces corresponding to the terms of the model.
> Projections onto each of the subspaces are easily worked out.  E.g., for a
> two-way analysis (Y~row+col+row:col) you can decompose the model effect as
> a row effect, a column effect, and an interaction effect. This allows
> straightforward calculation of the sums of squares of the ANOVA table. As
> you are probably well aware, if the design is unbalanced, the results will
> depend on the order of terms -- col + row + col:row gives a different
> result.
>
> What aov() does is that it first decomposes the observations according to
> the Error() term, forming the error strata, then fits the systematic part
> of the model to each stratum in turn. In the nice cases, each term of the
> model will be estimable in exactly one stratum, and part of the aov() logic
> is to detect and remove unestimable terms. E.g., if you have a balanced two
> way layout, say individual x treatment, the variable gender is a subfactor
> of individual, so Y ~ gender * treatment + Error(individual/treatment), the
> gender effect is estimated in the individual stratum, whereas treatment and
> gender:treatment are estimated in the individual:treatment stratum.
>
> It should be noted that it is very hard to interpret the results of aov()
> unless the Error() part of the model corresponds to a balanced experimental
> design. Or put more sharply: The model implied by the decomposition into
> error strata becomes nonsensical otherwise. If you do have a balanced
> design, the error strata reduce to simple combinations of means and
> observation, so the aov() algorithm is quite inefficient, but to my
> knowledge nobody has bothered to try and do better.
>
> -pd
>
> > On 13 Jul 2016, at 18:18 , Justin Thong <justinthong93 at gmail.com> wrote:
> >
> > Hi
> >
> > *I have been looking for a reference to explain how R uses the aov
> > command(at a deeper level)*. More specifically, how R reads the formulae
> > and R computes the sums of squares. I am not interested in understanding
> > what the difference of Type 1,2,3 sum of squares are. I am more
> interested
> > in finding out about how R computes ~x1:x2:x3  or how R computes ~A:x1
> > emphasizing sequential nature of the way it computes, and models even
> more
> > complicated than this.
> >
> > Yours sincerely,
> > Justin
> >
> > *I check my email at 9AM and 4PM everyday*
> > *If you have an EMERGENCY, contact me at +447938674419(UK) or
> > +60125056192(Malaysia)*
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>
>

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

	[[alternative HTML version deleted]]