[BioC] about formula in ancova

Mon Mar 15 13:51:41 CET 2010

Good morning,
I'd add that given an object of class lm in R, there is a package called 
"car" (from CRAN, not BioC) that can produce type III sums of squares.

Hope that helps,
JP

Naomi Altman wrote:
> R produces sequential sums of squares, not "Type III" or partial SS.  
> The sequential SS are adjusted for the other variables in the order in 
> which they are entered in the model.  The partial SS are adjusted for 
> all other variables in the model.
>
> The SAS manual explains this more fully under PROC REG (sequential and 
> partial) and PROC GLM (sequential and Type III) and probably has the 
> most concise explanations.
>
> --Naomi
>
> At 09:38 AM 3/14/2010, Dejian Zhao wrote:
>> To provide more details about the confusing results mentioned in my
>> previous email. Some parameters (eg. Sum Sq, Mean Sq, F value, Pr) about
>> the two variables seem to depend on the order in the formula, and the
>> variation of probability (Pr) directly changes the significance. As to
>> my own data, changing the variables order in the formula leads to
>> changes from significance to non-significance for one variable.
>>
>> Maybe this is a trivial question for a major in math. But as a major in
>> biology, I expect someone to explain the formula and give some
>> guidelines in writing a formula. Many thanks again.
>>
>> *Results of the first set of codes:*
>> > ancova(Sodium ~ Calories + Type, data=hotdog)
>> Analysis of Variance Table
>>
>> Response: Sodium
>> Df Sum Sq Mean Sq F value Pr(>F)
>> Calories 1 106270 106270 34.654 3.281e-07 ***
>> Type 2 227386 113693 37.074 1.336e-10 ***
>> Residuals 50 153331 3067
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> > ancova(Sodium ~ Type + Calories, data=hotdog)
>> Analysis of Variance Table
>>
>> Response: Sodium
>> Df Sum Sq Mean Sq F value Pr(>F)
>> Type 2 31739 15869 5.1749 0.009065 **
>> Calories 1 301917 301917 98.4526 2.089e-13 ***
>> Residuals 50 153331 3067
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>>
>> *Results of the second set of codes:*
>> > ancova(Sodium ~ Calories * Type, data=hotdog)
>> Analysis of Variance Table
>>
>> Response: Sodium
>> Df Sum Sq Mean Sq F value Pr(>F)
>> Calories 1 106270 106270 35.6885 2.747e-07 ***
>> Type 2 227386 113693 38.1815 1.195e-10 ***
>> Calories:Type 2 10402 5201 1.7466 0.1853
>> Residuals 48 142930 2978
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>> > ancova(Sodium ~ Type * Calories, data=hotdog)
>> Analysis of Variance Table
>>
>> Response: Sodium
>> Df Sum Sq Mean Sq F value Pr(>F)
>> Type 2 31739 15869 5.3294 0.008124 **
>> Calories 1 301917 301917 101.3927 2.019e-13 ***
>> Type:Calories 2 10402 5201 1.7466 0.185267
>> Residuals 48 142930 2978
>> ---
>> Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1
>>
>>
>> Dejian Zhao wrote:
>> > Dear list members,
>> >
>> > I have a question about the formula in ancova(), which is embedded in
>> > the HH package.There are several examples in the ancova() help file
>> > which can be accessed by type "?ancova" in R console after loading HH
>> > package.Some codes are as follows:
>> >
>> > hotdog <- read.table(hh("datasets/hotdog.dat"), header=TRUE)
>> > ## y ~ x + a or y ~ a + x ## constant slope, different intercepts
>> > ancova(Sodium ~ Calories + Type, data=hotdog)
>> > ancova(Sodium ~ Type + Calories, data=hotdog)
>> >
>> > After running the codes,I found I got different results when choosing
>> > different formula,i.e."ancova(Sodium ~ Calories + Type, data=hotdog)"
>> > and "ancova(Sodium ~ Type + Calories, data=hotdog) " produced 
>> different
>> > results.
>> >
>> > The same thing also happens to the following example codes:
>> > ## y ~ x * a or y ~ a * x ## different slopes, and different 
>> intercepts
>> > ancova(Sodium ~ Calories * Type, data=hotdog)
>> > ancova(Sodium ~ Type * Calories, data=hotdog)
>> >
>> > Hence,I am confused about the difference between the formula "y ~ x 
>> + a"
>> > and "y ~ a + x",likewise "y ~ x * a" and "y ~ a * x". I thought the
>> > order of variables in the formula is arbitrary,however,here it 
>> seems the
>> > order matters.
>> >
>> > Can someone explain the formula for me? And how should we choose the
>> > formula, or arrange the order of variables in the formula, when
>> > processing our own data?
>> >
>> > Many thanks!
>> > Dejian
>> >
>> > -----
>> > Dejian Zhao, PhD student
>> > Group of Evolutionary Ecology
>> > State Key Laboratory of Integrated Pest Management
>> > Institute of Zoology, Chinese Academy of Sciences
>> > 1 Beichen West Road, Chaoyang District, Beijing, P.R.China
>> > Postal code: 100101
>> > Tel (office): +86-10-64807217
>> > Fax: +86-10-64807099
>> > Email: zhaodj at ioz.ac.cn
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

-- 
=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & 
Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA 

Phone: 1-517-353-5102
E-mail: steibelj at msu.edu